Google 2 RSS
On ThreadWatch yesterday there was a thread about rank checkers, and I couldn’t believe that some SEOs don’t use them. We use our own heavy duty mega serp scraper to fully analyse any industry we are working in. Anyway, Graywolf mentioned how he would love a Google RSS or XML feed - I having been waiting for this for a long time, as their SERPs are so dirty it would make things a bit easier. And to only offer 10 results per page in their API is shocking!! Come on Goo, catch up with MSN + Yahoo.
Anyway, I got a bit bored today and knocked up a quick Google2RSS php script for those who are without (being xmas season)
Warning - this is very quick, dirty + crude code (in other words - not the best)
echo google2rss(“spam”, 10);
function google2rss($query, $numres)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, “http://www.google.com/search?q=”.$query.“&num=”.$numres.“&hl=en&safe=off”);
curl_setopt($ch, CURLOPT_USERAGENT, “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)”);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
curl_close ($ch);
$html = str_replace(“\r\n“, ” “, $html);
$html = str_replace(“<p class=g>”, “\n<p class=g>”, $html);
$html = str_replace(“</div>”, “</div>\n“, $html);
$html = str_replace(“View as HTML</a>”, “”, $html);
preg_match_all(“/<p class=g>(.*)<a class=l href=\”(.*)\” (.*)\”>(.*)<\/a>(.*)<br><font/i”, $html, $matches);
$items[‘url’] = $matches[2];
for ($i=0; $i < count($items[‘url’]); $i++)
{
$items[‘title’][$i] = strip_tags($matches[4][$i]);
$items[‘title’][$i] = str_replace(” - [ Translate this page”, “”, $items[‘title’][$i]);
$items[‘desc’][$i] = strip_tags($matches[5][$i]);
$items[‘desc’][$i] = preg_replace(“/^ ]/i”, “”, $items[‘desc’][$i]);
}
$rss = “<?xml version=\”1.0\” encoding=\”UTF-8\”?>\n“;
$rss .= “<rss version=\”2.0\”>\n“;
$rss .= “<channel>\n“;
$rss .= “\t<title>”.$query.” - Google Search</title>\n“;
$rss .= “\t<link>http://www.google.com/search?q=”.$query.“&num=”.$numres.“&hl=en&safe=off</link>\n“;
$rss .= “\t<description>”.$query.” - Google RSS search results</description>\n“;
$rss .= “\t<pubDate>”.date(DATE_RFC822).“</pubDate>\n“;
$rss .= “\t<generator>Mike Nott - http://www.nott.org</generator>\n“;
$rss .= “\t<language>en</language>\n“;
for ($i=0; $i < count($items[‘url’]); $i++)
{
$rss .= “\t<item>\n“;
$rss .= “\t\t<title>”.htmlspecialchars($items[‘title’][$i]).“</title>\n“;
$rss .= “\t\t<link>”.$items[‘url’][$i].“</link>\n“;
$rss .= “\t\t<description>”.htmlspecialchars($items[‘desc’][$i]).“</description>\n“;
$rss .= “\t\t<pubDate>”.date(DATE_RFC822).“</pubDate>\n“;
$rss .= “\t</item>\n“;
}
$rss .= “</channel>\n“;
$rss .= “</rss>”;
return $rss;
}
If anyone who is actually good at RegEx would like to improve the code, please do. ![]()