Fooling around with a little scraper project of mine I found the
Crowbar project.
It is nice! - I lets you scrape a page with all the nasty
javascript parsed... no more headaches!
My easy installation guide for installing it on
Ubuntu
$ svn checkout
http://simile.mit.edu/repository/crowbar/trunk/
$ sudo apt-get install xulrunner
$ xulrunner --install-app $HOME/trunk/xulapp
$ xulrunner $HOME/trunk/xulapp/application.ini
Now a small window pops up
And then go to: http://127.0.0.1:10000/
It's not all that perfect as you need to do some double scraping on
port 1000.
I see some issues with cookies and concurrency as well.
But overall me like a lot!