My favorite tool for this kind of work is Python. In fact, with help of the library BeautifulSoup, I can get information from websites even with very poor Html markup. The results of parsing can be saved in different formats such as XML, CSV, XLS. It takes usually 2-3 days to get data from a website.