Hi, I am Mark, a web scraping expert with extensive experience in web scraping techniques and libraries such as Selenium, requests, Beautiful Soup and Scrapy.
With me, you can get your data scraped and tasks automated within the first 24 hours of the contract, using my 5+ years of experience in Python Selenium Bot Development and Web Scraping.
Scraping Projects:
1️⃣ Login Protected Sites e.g: SEMrush
2️⃣ Puzzle Captcha Protected Sites e.g: Google, claimittexas
3️⃣ Real Estate & E-commerce Sites Data e.g: Zillow , ambalaza
4️⃣ Javascript / Dynamic Data Scraping e.g: Genius, Similarweb
5️⃣. Static Data downloading and parsing 100 Million + rows extracted since Febrary 28, 2020.
📋 Python Tech Stack:
1. Python : Core python, Django, Flask
2. BeatutifulSoup - Requests, Cookies, Header, Session
3. Selenium : webdriver, undetected webdriver, customized webdriver
4. SQL : PySql, mysql, mongoDB, JSON, CSV, Excel
5. Pandas : DataFrame
Skills and Experiences:
✔ IP Rotation
✔ Concurrent Threads
✔ Bypassing Captcha and Cloudflare
✔ Fast and Efficient Processing for Bulk Data (Using Python Advanced Data Structures)
✔ HTML Parsing and DOM manipulation
✔ Data Consistency (Free from unwanted strings and AI fillers)
✔ Part of Speech tagging on Millions of Tokens
✔ Grammatical errors checking on 20 Million lines of text.
✔ Explicit Image tagging
✔ HTML to Image Conversion
✔ PDF Extraction
🗄️ Databases:
1. MongoDB
2. PostgreSQL
3. MySQL
4. MS SQL
Output Formats:
JSON, CSV, XLSX, TXT
Bots:
1. Reddit posts aggregator ...................................................| 24 / 7 extractions and organization
2. YouTube ....................................................................................| Automation and statistics collection
3. Browser Automation ..........................................................| Automated Web form submission and actions
Cloud Infrastructure - Linux and Windows VPS:
1. Bind domain with machine
2. User permissions
3. Resource Management - Storage Volumes
4. Cron Jobs Setup
5. Database Backups and Restore
Java: Jsoup, Firefox & Chrome Driver
Communication lines are open 7 days a week via text, voice, or video meetup.
Mon - Thurs 7:00AM - 10:00PM EST
Fri - Sun 8:00AM - 11:00PM EST