WEB SCRAPING and AUTOMATION
This service aims at automating web scraping, data reorganization and repetitive tasks.
"Harvest" data from the web, clean and reorganize unstructured data before saving it to a database, excel or json files. Besides, crawlers can automatically send the end data files to a database on regular basis or real time
1. Fetch millions of data efficiently
2. Run it on server
3. Fetching data
4. Run spider in multiple processes
5. Automate data processing pipelines
6. Host crawlers on cloud- heroku
Types of data
Ecommerce products
Tweets, twitter users and their activities
product reviews
Reddit discussions
Facebook posts and comments ,users, activities etc
movie/film data
Think of any website
etc
Tools:
SCRAPY - a python framework that support development of spiders
Beautiful soup- extracts data content from xml and html documents
Selenium -come in handy when rendering javascript
Heroku- hosting web scrapers
Thank you for checking my gig, Looking forward to working on your project.