Banner Image

All Services

Programming & Development Programming & Software

Python Developer, Scraping & Automations

$15/hr Starting at $35

Hello from Pakistani, I am Hira Arif.


With me, you can get your data scraped and tasks automated within the 1st 24 hours of the contract.


For 2.7 years, I have been helping clients achieve fast, efficient, and reliable Web scraping

solutions and Selenium Automations / Bot Development.


I have extensive experience in delivering high quality clean data sets using data scraping techniques / libraries such as Selenium, requests, Beautiful Soup, and Scrapy.


Scraping Projects:

1️⃣ Login Protected Sites e.g: SEMrush, redtoolbox

2️⃣ Puzzle Captcha (google reCaptcha v2) Protected Sites e.g: Google, claimittexas

3️⃣ Real Estate & E-commerce Sites Data e.g: Zillow , ambalaza, vrbo

4️⃣ Javascript / Dynamic Data Scraping e.g: Genius, Similarweb

5️⃣. Static Data downloading and parsing 100 Million + rows extracted since January 26, 2022.


📋 Python Tech Stack:

1. BeatutifulSoup - Requests, Cookies, Session

2. Undetected Chrome driver - Selenium Web driver

3. Pymongo, Psycopg2

4. SpaCy, Pandas, OpenCV, Langid, Slugify,

5. JSON, LXML, CSV, RE


Skills and Experiences:

✔ IPs Rotation

✔ Concurrent Threads

✔ Bypassing Captcha and Cloudflare

✔ Fast and Efficient Processing for Bulk Data (Using Python Advanced Data Structures)

✔ HTML Parsing and DOM manipulation

✔ Data Consistency (Free from unwanted strings and AI fillers)

✔ Part of Speech tagging on Millions of Tokens

✔ Grammatical errors checking on 20 Million lines of text.

✔ Explicit Image tagging

✔ HTML to Image Conversion

✔ PDF Extraction


🗄️ Databases:

1. MongoDB

2. PostgreSQL

3. MySQL


Output Formats:

JSON, CSV, XLSX, TXT


On-going Projects:

1. Since 2023 (login-protected site), scraping on a website for clients using:

35 Remote Machines (256 GBs) | 230 IPs | 350 Threads

2. 24/7 Scraping of SEMrush and Similar Web

3. HTML Downloading, Parsing, and MongoDB Management


Bots:

1. Reddit posts aggregator ...................................................| 24 / 7 extractions and organization

2. YouTube ....................................................................................| Automation and statistics collection

3. Browser Automation ..........................................................| Automated Web form submission and actions


Cloud Infrastructure - Linux and Windows VPS:

1. Bind domain with machine

2. User permissions

3. Resource Management - Storage Volumes

4. Cron Jobs Setup

5. Database Backups and Restore

Java: Jsoup, Firefox & Chrome Driver


Communication lines are open 7 days a week via text, voice, or video meetup.

Mon - Thurs 21:00 - 06:00

Fri - Sun 00:00 - 00:00

About

$15/hr Ongoing

Download Resume

Hello from Pakistani, I am Hira Arif.


With me, you can get your data scraped and tasks automated within the 1st 24 hours of the contract.


For 2.7 years, I have been helping clients achieve fast, efficient, and reliable Web scraping

solutions and Selenium Automations / Bot Development.


I have extensive experience in delivering high quality clean data sets using data scraping techniques / libraries such as Selenium, requests, Beautiful Soup, and Scrapy.


Scraping Projects:

1️⃣ Login Protected Sites e.g: SEMrush, redtoolbox

2️⃣ Puzzle Captcha (google reCaptcha v2) Protected Sites e.g: Google, claimittexas

3️⃣ Real Estate & E-commerce Sites Data e.g: Zillow , ambalaza, vrbo

4️⃣ Javascript / Dynamic Data Scraping e.g: Genius, Similarweb

5️⃣. Static Data downloading and parsing 100 Million + rows extracted since January 26, 2022.


📋 Python Tech Stack:

1. BeatutifulSoup - Requests, Cookies, Session

2. Undetected Chrome driver - Selenium Web driver

3. Pymongo, Psycopg2

4. SpaCy, Pandas, OpenCV, Langid, Slugify,

5. JSON, LXML, CSV, RE


Skills and Experiences:

✔ IPs Rotation

✔ Concurrent Threads

✔ Bypassing Captcha and Cloudflare

✔ Fast and Efficient Processing for Bulk Data (Using Python Advanced Data Structures)

✔ HTML Parsing and DOM manipulation

✔ Data Consistency (Free from unwanted strings and AI fillers)

✔ Part of Speech tagging on Millions of Tokens

✔ Grammatical errors checking on 20 Million lines of text.

✔ Explicit Image tagging

✔ HTML to Image Conversion

✔ PDF Extraction


🗄️ Databases:

1. MongoDB

2. PostgreSQL

3. MySQL


Output Formats:

JSON, CSV, XLSX, TXT


On-going Projects:

1. Since 2023 (login-protected site), scraping on a website for clients using:

35 Remote Machines (256 GBs) | 230 IPs | 350 Threads

2. 24/7 Scraping of SEMrush and Similar Web

3. HTML Downloading, Parsing, and MongoDB Management


Bots:

1. Reddit posts aggregator ...................................................| 24 / 7 extractions and organization

2. YouTube ....................................................................................| Automation and statistics collection

3. Browser Automation ..........................................................| Automated Web form submission and actions


Cloud Infrastructure - Linux and Windows VPS:

1. Bind domain with machine

2. User permissions

3. Resource Management - Storage Volumes

4. Cron Jobs Setup

5. Database Backups and Restore

Java: Jsoup, Firefox & Chrome Driver


Communication lines are open 7 days a week via text, voice, or video meetup.

Mon - Thurs 21:00 - 06:00

Fri - Sun 00:00 - 00:00

Skills & Expertise

APIArtificial IntelligenceAutomation EngineeringChatbotsCrawlingData ExtractionData ManagementDatabaseHTMLJavaScriptJSONLinuxMongoDBParsingPDF to Excel ConversionPostgreSQLProgrammingPythonSeleniumSelenium WebDriverSQLVersion ControlWeb ScrapingXHTMLXML

0 Reviews

This Freelancer has not received any feedback.