Posted 2 Days Ago Job ID: 2093539 65 quotes received

Website Data Extraction / Scraping

Featured
Fixed Price or Hourly
Quotes (65)  ·  Premium Quotes (4)  ·  Invited (0)  ·  Hired (0)
1 of 1

  Send before: September 14, 2024

Send a Quote

Any Category

RFP General Description

This assignment consists of three tasks involving web scraping for various types of listings:

    1. Task #1: Scraping of Real Estate Listings
    2. Task #2: Scraping of Business Directory Listings
    3. Task #3: Scraping of Event Listings

 

Task #1: Scraping of Real Estate Listings


Description: Extract real estate listings from a website similar to realtor.com. An example of such a listing can be found here:  https://www.realtor.com/realestateandhomes-detail/201-Ridgecrest-Dr_Greenville_SC_29609_M68057-00430?property_id=6805700430&from=ab_mixed_view_card


Approximate size: 60,000 listings.


Data Requirement: The data should be provided in a CSV file format, which can be sent via email. The CSV should include the following fields:


  1. URL
  2. Agent Phone
  3. Agent Email
  4. Agent Name
  5. Brokerage Name
  6. Acres
  7. Price
  8. Has a House?
  9. Bedrooms
  10. Bathrooms
  11. Square Feet
  12. Listing Headline/Title
  13. External Property Link
  14. Land ID URL
  15. Address
  16. ZIP Code
  17. City/Town
  18. State
  19. County
  20. Latitude
  21. Longitude
  22. Short Description
  23. Detailed Property Description
  24. Available Financing
  25. Annual Taxes
  26. Tax Rate
  27. Taxes Without Exemption
  28. Amenities
  29. Images
  30. Main Image
  31. YouTube Video Link


Attachments and Images: All attachments and images should be uploaded to an S3 bucket.


Process: First round of scraping should be conducted as soon as possible.


Starting from October 15th, new listings and updates to existing listings are to be provided on a weekly basis for the next 6-10 weeks.


Task #2: Scraping of Business Directory Listings


Extract business listings from a website similar to allpages.com, but we only need Texas listings.


Approximate size: 200,000 to 300,000 listings.


Data Requirement: The data should be provided in a CSV file format, which can be sent via email. The CSV should include the following fields:


  1. URL
  2. Business Category
  3. Business Name
  4. Description
  5. County
  6. City/Town
  7. ZIP Code
  8. Address
  9. Phone Number
  10. Email
  11. Website Link
  12. Business Hours


Timeline: Expected completion by September 15th.


Data validation:


If you can provide the service, provide a quote for the validation of company website link so it does not return 4xx. If it returns 4xx, mark it in the csv file.


If you can provide the service, provide a quote for the validation of phone numbers and method options.


Task #3: Scraping of Events 


Description: Extract events from 70-100 websites similar to this one: 

https://members.lufkintexas.org/events  


Data Requirement: The data should be provided in a CSV file format and include the following fields:


  1. URL
  2. Listing Type
  3. Event Listing ID
  4. Event Type (Category)
  5. Image
  6. Primary Photo
  7. Listing Headline/Title
  8. State
  9. County
  10. City/Town
  11. ZIP Code
  12. Address
  13. Venue Name
  14. Map Location
  15. Event Start Date
  16. Event End Date
  17. Event Day of the Week
  18. Event Start Time
  19. Event End Time
  20. Event Description
  21. External Website Link
  22. Facebook Link
  23. Ticket Price
  24. Ticket Details


Implementation:

An automatic mechanism for data extraction must be developed. The extracted CSV files and their associated images must be automatically uploaded to an S3 bucket on a weekly basis.

 

Data Quality Check and Project Acceptance Criteria

To ensure the scraped data meets quality standards, we will perform random sampling checks.

 

RFP Response

Please provide the estimated time and price for each Task separately (Task 1, Task 2, Task 3).

Specifically for Task #3, please include the price for scraping one website.

We look forward to your detailed proposal and timeline for each task.

... Show more
Jeffrey H United States