Hello from Pakistani, I am Hira Arif.
With me, you can get your data scraped and tasks automated within the 1st 24 hours of the contract.
For 2.7 years, I have been helping clients achieve fast, efficient, and reliable Web scraping
solutions and Selenium Automations / Bot Development.
I have extensive experience in delivering high quality clean data sets using data scraping techniques / libraries such as Selenium, requests, Beautiful Soup, and Scrapy.
Scraping Projects:
1️⃣ Login Protected Sites e.g: SEMrush, redtoolbox
2️⃣ Puzzle Captcha (google reCaptcha v2) Protected Sites e.g: Google, claimittexas
3️⃣ Real Estate & E-commerce Sites Data e.g: Zillow , ambalaza, vrbo
4️⃣ Javascript / Dynamic Data Scraping e.g: Genius, Similarweb
5️⃣. Static Data downloading and parsing 100 Million + rows extracted since January 26, 2022.
📋 Python Tech Stack:
1. BeatutifulSoup - Requests, Cookies, Session
2. Undetected Chrome driver - Selenium Web driver
3. Pymongo, Psycopg2
4. SpaCy, Pandas, OpenCV, Langid, Slugify,
5. JSON, LXML, CSV, RE
Skills and Experiences:
✔ IPs Rotation
✔ Concurrent Threads
✔ Bypassing Captcha and Cloudflare
✔ Fast and Efficient Processing for Bulk Data (Using Python Advanced Data Structures)
✔ HTML Parsing and DOM manipulation
✔ Data Consistency (Free from unwanted strings and AI fillers)
✔ Part of Speech tagging on Millions of Tokens
✔ Grammatical errors checking on 20 Million lines of text.
✔ Explicit Image tagging
✔ HTML to Image Conversion
✔ PDF Extraction
🗄️ Databases:
1. MongoDB
2. PostgreSQL
3. MySQL
Output Formats:
JSON, CSV, XLSX, TXT
On-going Projects:
1. Since 2023 (login-protected site), scraping on a website for clients using:
35 Remote Machines (256 GBs) | 230 IPs | 350 Threads
2. 24/7 Scraping of SEMrush and Similar Web
3. HTML Downloading, Parsing, and MongoDB Management
Bots:
1. Reddit posts aggregator ...................................................| 24 / 7 extractions and organization
2. YouTube ....................................................................................| Automation and statistics collection
3. Browser Automation ..........................................................| Automated Web form submission and actions
Cloud Infrastructure - Linux and Windows VPS:
1. Bind domain with machine
2. User permissions
3. Resource Management - Storage Volumes
4. Cron Jobs Setup
5. Database Backups and Restore
Java: Jsoup, Firefox & Chrome Driver
Communication lines are open 7 days a week via text, voice, or video meetup.
Mon - Thurs 21:00 - 06:00
Fri - Sun 00:00 - 00:00