Project Description: Web Crawler Development
We are looking for someone who can only crawl data or someone who can crawl data and also program a dashboard
Project Overview:
We are seeking an experienced PHP developer with expertise in developing and customizing web crawlers. The goal of this project is to create a web crawler that scans specified publicly accessible websites and stores specific data, along with related high-resolution images, in a database.
Task Description:
Development of a Web Crawler:
- The crawler should scan the specified public websites and collect structured data (e.g., text content, metadata).
- Images should be downloaded in the highest available resolution and stored in a defined directory.
Database Integration:
- The collected data should be stored in a relational database (e.g., MySQL).
- Data structure: Tables for URLs, collected content, images, and associated metadata.
Admin Dashboard:
Provide a user-friendly admin interface to:
- Start and stop the crawler.
- Display statistics:
- Timestamp of the last crawl.
- Number of crawled URLs.
- Number of records and images.
- Manage the list of websites to be crawled.
Flexibility and Customization:
- Since the structure of the specified websites may vary, the crawler must be adaptable to each structure.
- Implement a flexible parser that uses specific rules for each website.
Optimization and Performance:
- Ensure that the crawler operates efficiently and does not cause excessive server load.
- Implement error-handling mechanisms (e.g., HTTP errors, timeouts).
The data should then be imported into an existing database structure.
we already had such a similar project and would suggest as payment
1 time the crawler fixed price
2 then per URL
From previous work I know that some urls are easy to read out others are more time-consuming (on average I think it will take 1-3 hours to customize a url).