We will scrape any resource:
- webpages (static/dynamic)
- pdf files
- images (OCR), etc. documents
Extract the desired data
Transform/clean the data
Load the data in the required format
Technologies familiar with:
Python, Selenium, Scrapy, Beautiful Soup, Pandas, Sklearn, Tesseract, Javascript, HTML, Airflow, etc.