Source Rally PHP Community Scripts .. Sign up .. Login
A function to calculate the complete to find the complete url from a href attribute in a HTML document.
Access: Public      Tags: crawl, php, scrape, url, href, bot
Add to favourites       Subscribe comments       Copy code       Bookmark
<?php
/*
Example:
$urls = array('/blah/../foobar',
'./foobar',
'/blah/../foobar/test',
'/blah/../foobar//bartest',
'blah/./test',
'http://www.com.com/test');
$base = 'http://www.domain.com';
foreach($urls as $k => $v)
{
    echo 'Calculated: '.calculateHref($base,$v);
}
*/

function calculateHref($base,$href)
{
    
$hrefInfo parse_url($href);
    if(
$hrefInfo['scheme']!="")
    {
        return 
$href;
    }
    
$info parse_url($base);
    if(
substr($info['path'],-1)!="/")
    {
        
$info['path'].='/';
    }
    
$href explode('/',$href);
    
$dir = array();
    foreach(
$href as $v1)
    {
        switch(
$v1)
        {
            case 
'.':
            case 
'':
            break;
            case 
'..':
            
array_pop($dir);
            break;
            default:
            
$dir[]=$v1;
        }
    }
    return 
$info['scheme'].'://'.$info['host'].$info['path'].implode('/',$dir).'<br>';
}
?>
Add to favourites       Subscribe comments       Copy code       Bookmark
Sign up to add your own comment here!

Shared by:

regin

Mail user Add to friends
All user contributed content is available under the unless specified otherwise.
Remaining copyrights Regin Gaarsmand 2006-2008
About www.SourceRally.net