I build custom Python Spiders to extract data from websites and analyze external websites to identify individual items of data to extract in structured format. I use the Pythons Scrapy tool to accomplish this and have yet to find a site that I could not extract from. I can also extract in such a way to keep the structure of the page and information intact.