Just saw this post on:
http://www.dzone.com/links/ruby_screenscraper_in_60_seconds.html
Lately people get so blinded when they see how to do stuff with
RUBY... But what about PHP?
The task from the mentioned news is to fetch the content from the
webpage
http://www.igvita.com/blog/, find
the first blockquote and extract its content.
PHP5 comes by default with the DOMDocument extension. The PHP
DOMDocument is really nice for fetching and organizing HTML content
and features most of the DOMDocument features known from javascript
such as document.getElementById('id');
Well, first we have to create the DOMDocument object:
$doc = new DOMDocument;
Next is to get the content from the specified URL and import it
into the DOMDocument:
$doc->loadHTMLFile('http://www.igvita.com/blog/');
//PHP by default allows reading urls as files:
allow_url_fopen=1
Last part is to find the first blockquote tag and get the
content:
$doc->getElementsByTagName('blockquote')->item(0)->textContent;
//
->getElementsByTagName('blockquote')
- get all blockquote tags
//
->item(0) - get the first
blockquote
//
->textContent get the content of
that blockquote
Finally we got:
[CODE]
$doc = new DOMDocument;
$doc->loadHTMLFile('http://www.igvita.com/blog/');
//if you don't wanna see warning put an @ infront of the line
above
echo
$doc->getElementsByTagName('blockquote')->item(0)->textContent;
[CODE]