<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mike Nott &#187; Code</title>
	<atom:link href="http://www.nott.org/blog/category/code/feed" rel="self" type="application/rss+xml" />
	<link>http://www.nott.org</link>
	<description>SEO, Music, Photography &#38; Other Stuff</description>
	<lastBuildDate>Thu, 26 Jan 2012 09:18:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>PHP Whois Script</title>
		<link>http://www.nott.org/blog/php-whois-script.html</link>
		<comments>http://www.nott.org/blog/php-whois-script.html#comments</comments>
		<pubDate>Wed, 25 Jan 2006 21:43:04 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web/Net]]></category>

		<guid isPermaLink="false">http://www2.nott.org/blog/php-whois-script.html</guid>
		<description><![CDATA[Glenn over at SSEO asked for a script to do mass whois lookups. Use this function: [code lang="php"] function getwhois($domain, $tld) { require_once("whois.class.php"); $whois = new Whois(); if( !$whois->ValidDomain($domain.'.'.$tld) ){ return 'Sorry, the domain is not valid or not supported.'; } if( $whois->Lookup($domain.'.'.$tld) ) { return $whois->GetData(1); }else{ return 'Sorry, an error occurred.'; } } [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.glenn.ca/">Glenn</a> over at <a href="http://its-a-secret">SSEO</a> asked for a script to do mass whois lookups. </p>
<p>Use this function:</p>
<p>[code lang="php"]</p>
<p>    function getwhois($domain, $tld)<br />
    {<br />
        require_once("whois.class.php");</p>
<p>        $whois = new Whois();</p>
<p>	    if( !$whois->ValidDomain($domain.'.'.$tld) ){<br />
		    return 'Sorry, the domain is not valid or not supported.';<br />
	    }</p>
<p>        if( $whois->Lookup($domain.'.'.$tld) )<br />
        {<br />
            return $whois->GetData(1);<br />
        }else{<br />
            return 'Sorry, an error occurred.';<br />
        }<br />
    }</p>
<p>	$domain = trim($_REQUEST['domain']);</p>
<p>	$dot = strpos($domain, '.');<br />
	$sld = substr($domain, 0, $dot);<br />
	$tld = substr($domain, $dot+1);                     </p>
<p>	$whois = getwhois($sld, $tld);</p>
<p>	echo "
<pre>";
	echo $whois;
	echo "</pre>
<p>";   </p>
<p>[/code]</p>
<p>To call this <a href="/uploads/whois.class.php.txt">class</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/php-whois-script.html/feed</wfw:commentRss>
		<slash:comments>33</slash:comments>
		</item>
		<item>
		<title>Google Datacenters</title>
		<link>http://www.nott.org/blog/google-datacenters.html</link>
		<comments>http://www.nott.org/blog/google-datacenters.html#comments</comments>
		<pubDate>Thu, 05 Jan 2006 11:57:29 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=50</guid>
		<description><![CDATA[For all those wanting to check the rollout of the BigDaddy update, here&#8217;s a list of 116 Google Datacenters: 64.233.161.98 64.233.161.99 64.233.161.104 64.233.161.105 64.233.161.106 64.233.161.107 64.233.161.147 64.233.163.99 64.233.163.104 64.233.167.99 64.233.167.104 64.233.167.106 64.233.167.107 64.233.167.147 64.233.171.99 64.233.171.103 64.233.171.104 64.233.171.105 64.233.171.106 64.233.171.107 64.233.171.147 64.233.179.99 64.233.179.104 64.233.179.106 64.233.179.107 64.233.183.99 64.233.183.104 64.233.183.107 64.233.185.99 64.233.185.104 64.233.185.106 64.233.185.107 64.233.185.147 64.233.187.99 64.233.187.104 64.233.187.106 [...]]]></description>
			<content:encoded><![CDATA[<p>For all those wanting to check the rollout of the <a href="http://www.mattcutts.com/blog/bigdaddy-on-the-move/">BigDaddy</a> update, here&#8217;s a list of 116 Google Datacenters:</p>
<blockquote><p>
<a href="http://64.233.161.98/">64.233.161.98</a> <a href="http://64.233.161.99/">64.233.161.99</a> <a href="http://64.233.161.104/">64.233.161.104</a> <a href="http://64.233.161.105/">64.233.161.105</a> <a href="http://64.233.161.106/">64.233.161.106</a> <a href="http://64.233.161.107/">64.233.161.107</a> <a href="http://64.233.161.147/">64.233.161.147</a> <a href="http://64.233.163.99/">64.233.163.99</a> <a href="http://64.233.163.104/">64.233.163.104</a> <a href="http://64.233.167.99/">64.233.167.99</a> <a href="http://64.233.167.104/">64.233.167.104</a> <a href="http://64.233.167.106/">64.233.167.106</a> <a href="http://64.233.167.107/">64.233.167.107</a> <a href="http://64.233.167.147/">64.233.167.147</a> <a href="http://64.233.171.99/">64.233.171.99</a> <a href="http://64.233.171.103/">64.233.171.103</a> <a href="http://64.233.171.104/">64.233.171.104</a> <a href="http://64.233.171.105/">64.233.171.105</a> <a href="http://64.233.171.106/">64.233.171.106</a> <a href="http://64.233.171.107/">64.233.171.107</a> <a href="http://64.233.171.147/">64.233.171.147</a> <a href="http://64.233.179.99/">64.233.179.99</a> <a href="http://64.233.179.104/">64.233.179.104</a> <a href="http://64.233.179.106/">64.233.179.106</a> <a href="http://64.233.179.107/">64.233.179.107</a> <a href="http://64.233.183.99/">64.233.183.99</a> <a href="http://64.233.183.104/">64.233.183.104</a> <a href="http://64.233.183.107/">64.233.183.107</a> <a href="http://64.233.185.99/">64.233.185.99</a> <a href="http://64.233.185.104/">64.233.185.104</a> <a href="http://64.233.185.106/">64.233.185.106</a> <a href="http://64.233.185.107/">64.233.185.107</a> <a href="http://64.233.185.147/">64.233.185.147</a> <a href="http://64.233.187.99/">64.233.187.99</a> <a href="http://64.233.187.104/">64.233.187.104</a> <a href="http://64.233.187.106/">64.233.187.106</a> <a href="http://64.233.187.107/">64.233.187.107</a> <a href="http://64.233.189.104/">64.233.189.104</a> <a href="http://66.102.7.98/">66.102.7.98</a> <a href="http://66.102.7.99/">66.102.7.99</a> <a href="http://66.102.7.104/">66.102.7.104</a> <a href="http://66.102.7.105/">66.102.7.105</a> <a href="http://66.102.7.106/">66.102.7.106</a> <a href="http://66.102.7.107/">66.102.7.107</a> <a href="http://66.102.7.147/">66.102.7.147</a> <a href="http://66.102.9.99/">66.102.9.99</a> <a href="http://66.102.9.104/">66.102.9.104</a> <a href="http://66.102.9.106/">66.102.9.106</a> <a href="http://66.102.9.107/">66.102.9.107</a> <a href="http://66.102.9.147/">66.102.9.147</a> <a href="http://66.102.11.99/">66.102.11.99</a> <a href="http://66.102.11.104/">66.102.11.104</a> <a href="http://66.102.11.106/">66.102.11.106</a> <a href="http://66.102.11.107/">66.102.11.107</a> <a href="http://66.249.81.99/">66.249.81.99</a> <a href="http://66.249.81.104/">66.249.81.104</a> <a href="http://66.249.81.106/">66.249.81.106</a> <a href="http://66.249.81.107/">66.249.81.107</a> <a href="http://66.249.83.99/">66.249.83.99</a> <a href="http://66.249.83.104/">66.249.83.104</a> <a href="http://66.249.83.106/">66.249.83.106</a> <a href="http://66.249.83.107/">66.249.83.107</a> <a href="http://66.249.85.99/">66.249.85.99</a> <a href="http://66.249.85.104/">66.249.85.104</a> <a href="http://66.249.85.106/">66.249.85.106</a> <a href="http://66.249.85.107/">66.249.85.107</a> <a href="http://66.249.87.99/">66.249.87.99</a> <a href="http://66.249.87.104/">66.249.87.104</a> <a href="http://66.249.89.99/">66.249.89.99</a> <a href="http://66.249.89.104/">66.249.89.104</a> <a href="http://66.249.89.106/">66.249.89.106</a> <a href="http://66.249.89.107/">66.249.89.107</a> <a href="http://66.249.93.99/">66.249.93.99</a> <a href="http://66.249.93.104/">66.249.93.104</a> <a href="http://66.249.93.106/">66.249.93.106</a> <a href="http://66.249.93.107/">66.249.93.107</a> <a href="http://72.14.203.99/">72.14.203.99</a> <a href="http://72.14.203.104/">72.14.203.104</a> <a href="http://72.14.203.106/">72.14.203.106</a> <a href="http://72.14.203.107/">72.14.203.107</a> <a href="http://72.14.205.99/">72.14.205.99</a> <a href="http://72.14.205.104/">72.14.205.104</a> <a href="http://72.14.205.106/">72.14.205.106</a> <a href="http://72.14.205.107/">72.14.205.107</a> <a href="http://72.14.207.99/">72.14.207.99</a> <a href="http://72.14.207.104/">72.14.207.104</a> <a href="http://72.14.207.106/">72.14.207.106</a> <a href="http://72.14.207.107/">72.14.207.107</a> <a href="http://216.239.37.98/">216.239.37.98</a> <a href="http://216.239.37.99/">216.239.37.99</a> <a href="http://216.239.37.104/">216.239.37.104</a> <a href="http://216.239.37.105/">216.239.37.105</a> <a href="http://216.239.37.106/">216.239.37.106</a> <a href="http://216.239.37.107/">216.239.37.107</a> <a href="http://216.239.37.147/">216.239.37.147</a> <a href="http://216.239.39.98/">216.239.39.98</a> <a href="http://216.239.39.99/">216.239.39.99</a> <a href="http://216.239.39.104/">216.239.39.104</a> <a href="http://216.239.39.105/">216.239.39.105</a> <a href="http://216.239.39.106/">216.239.39.106</a> <a href="http://216.239.39.107/">216.239.39.107</a> <a href="http://216.239.53.98/">216.239.53.98</a> <a href="http://216.239.53.99/">216.239.53.99</a> <a href="http://216.239.53.104/">216.239.53.104</a> <a href="http://216.239.53.105/">216.239.53.105</a> <a href="http://216.239.53.106/">216.239.53.106</a> <a href="http://216.239.53.107/">216.239.53.107</a> <a href="http://216.239.57.98/">216.239.57.98</a> <a href="http://216.239.57.99/">216.239.57.99</a> <a href="http://216.239.57.103/">216.239.57.103</a> <a href="http://216.239.57.104/">216.239.57.104</a> <a href="http://216.239.57.105/">216.239.57.105</a> <a href="http://216.239.57.106/">216.239.57.106</a> <a href="http://216.239.57.107/">216.239.57.107</a> <a href="http://216.239.57.147/">216.239.57.147</a> <a href="http://216.239.59.98/">216.239.59.98</a> <a href="http://216.239.59.99/">216.239.59.99</a> <a href="http://216.239.59.103/">216.239.59.103</a> <a href="http://216.239.59.104/">216.239.59.104</a> <a href="http://216.239.59.105/">216.239.59.105</a> <a href="http://216.239.59.106/">216.239.59.106</a> <a href="http://216.239.59.107/">216.239.59.107</a> <a href="http://216.239.59.147/">216.239.59.147</a> <a href="http://216.239.63.99/">216.239.63.99</a> <a href="http://216.239.63.104/">216.239.63.104</a>
</p></blockquote>
<p>Here&#8217;s a quick dirty script that will check the number of each rank for a site on all datacenters:</p>
<p>[code lang="php"]<br />
    function singlethread_crawl($url)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $ch = curl_init();</p>
<p>        curl_setopt($ch, CURLOPT_NOSIGNAL, 1);<br />
        curl_setopt($ch, CURLOPT_NOPROGRESS, 1);<br />
        curl_setopt($ch, CURLOPT_FAILONERROR, 1);<br />
        curl_setopt($ch, CURLOPT_URL, $url);<br />
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);<br />
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);<br />
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);<br />
        curl_setopt($ch, CURLOPT_MAXREDIRS, 1);<br />
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);<br />
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);</p>
<p>        $html = curl_exec($ch);</p>
<p>        curl_close ($ch);</p>
<p>        return $html;<br />
    }</p>
<p>    function multithread_crawl($urls, $timeout)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $mh = curl_multi_init();</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            $conn[$i] = curl_init($url);<br />
            curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOSIGNAL, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOPROGRESS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_FAILONERROR, 1);<br />
            curl_setopt($conn[$i], CURLOPT_URL, $url);<br />
            curl_setopt($conn[$i], CURLOPT_USERAGENT, $agent);<br />
            curl_setopt($conn[$i], CURLOPT_SSL_VERIFYPEER, 0);<br />
            curl_setopt($conn[$i], CURLOPT_FOLLOWLOCATION, 1);<br />
            curl_setopt($conn[$i], CURLOPT_MAXREDIRS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_TIMEOUT, $timeout);</p>
<p>            curl_multi_add_handle ($mh, $conn[$i]);<br />
        }</p>
<p>        do<br />
        {<br />
            $mrc = curl_multi_exec($mh, $active);<br />
        }<br />
        while ($mrc == CURLM_CALL_MULTI_PERFORM);</p>
<p>        while ($active and $mrc == CURLM_OK)<br />
        {<br />
            if (curl_multi_select($mh) != -1)<br />
            {<br />
                do<br />
                {<br />
                    $mrc = curl_multi_exec($mh, $active);<br />
                }<br />
                while ($mrc == CURLM_CALL_MULTI_PERFORM);<br />
            }<br />
        }</p>
<p>        if ($mrc != CURLM_OK)<br />
        {<br />
            print "Curl multi read error $mrc\n";<br />
        }</p>
<p>        $res = array();<br />
        $e = 0;</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            if (($err = curl_error($conn[$i])) == '')<br />
            {<br />
            	$res[$i]=curl_multi_getcontent($conn[$i]);<br />
            }<br />
            else<br />
            {<br />
                echo "error: ".$url." (".$err.")\n";<br />
            }</p>
<p>            curl_multi_remove_handle($mh,$conn[$i]);<br />
            curl_close($conn[$i]);<br />
        }</p>
<p>        curl_multi_close($mh);</p>
<p>        return $res;<br />
    }</p>
<p>	function googleresults($ip, $search, $num){</p>
<p>		$url = "http://".$ip."/ie?q=".urlencode($search)."&#038;num=".$num;</p>
<p>		$html = singlethread_crawl($url);</p>
<p>		preg_match_all("/\" href=(.*)>/iU", $html, $links);</p>
<p>		$urlarray = str_replace("<b>","",$links[1]);<br />
		$urlarray = str_replace("</b>","",$urlarray);<br />
		$urlarray = preg_replace("/>.*$/i","", $urlarray);</p>
<p>		return $urlarray;<br />
	}</p>
<p>	function googleresults_multi($ips, $search, $num){</p>
<p>        for ($i=0; $i < count($ips); $i++)<br />
        {<br />
		    $urls[$i] = "http://".$ips[$i]."/ie?q=".urlencode($search)."&#038;num=".$num;<br />
        }</p>
<p>		$html = multithread_crawl($urls, count($ips)/2);</p>
<p>        for ($i=0; $i < count($ips); $i++)<br />
        {<br />
		    preg_match_all("/\" href=(.*)>/iU", $html[$i], $links[$i]);</p>
<p>		    $urlarray[$i] = str_replace("<b>","",$links[$i][1]);<br />
		    $urlarray[$i] = str_replace("</b>","",$urlarray[$i]);<br />
		    $urlarray[$i] = preg_replace("/>.*$/i","", $urlarray[$i]);<br />
        }</p>
<p>		return $urlarray;<br />
	}</p>
<p>	function removehttp($url){</p>
<p>		if ((substr($url,0,7) == 'http://') || (substr($url,0,8) == 'https://'))<br />
		{<br />
			$url = substr($url,7);<br />
		}</p>
<p>		$url = trim($url);</p>
<p>		return $url;<br />
	}</p>
<p>	function rank($arr, $item){</p>
<p>		for ($i = 0; $i < count($arr); $i++)<br />
		{<br />
			if (eregi($item,$arr[$i]))<br />
			{<br />
				$result = $i+1;</p>
<p>				break;<br />
			}<br />
		}</p>
<p>		return($result);<br />
	}</p>
<p>    $ips = array("64.233.161.99", "64.233.161.104", "64.233.161.105", "64.233.161.106", "64.233.161.107", "64.233.161.147", "64.233.163.99", "64.233.163.104", "64.233.167.99", "64.233.167.104", "64.233.167.106", "64.233.167.107", "64.233.167.147", "64.233.171.99", "64.233.171.104", "64.233.171.105", "64.233.171.106", "64.233.171.107", "64.233.171.147", "64.233.179.99", "64.233.179.104", "64.233.179.106", "64.233.179.107", "64.233.183.99", "64.233.183.104", "64.233.183.107", "64.233.185.99", "64.233.185.104", "64.233.185.106", "64.233.185.107", "64.233.187.99", "64.233.187.104", "64.233.187.106", "64.233.187.107", "64.233.189.104", "66.102.7.99", "66.102.7.104", "66.102.7.105", "66.102.7.106", "66.102.7.107", "66.102.7.147", "66.102.9.99", "66.102.9.104", "66.102.9.106", "66.102.9.107", "66.102.9.147", "66.102.11.99", "66.102.11.104", "66.102.11.106", "66.102.11.107", "66.249.81.99", "66.249.81.104", "66.249.81.106", "66.249.81.107", "66.249.83.99", "66.249.83.104", "66.249.83.106", "66.249.83.107", "66.249.85.99", "66.249.85.104", "66.249.85.106", "66.249.85.107", "66.249.87.99", "66.249.87.104", "66.249.89.99", "66.249.89.104", "66.249.89.106", "66.249.89.107", "66.249.93.99", "66.249.93.104", "66.249.93.106", "66.249.93.107", "72.14.203.99", "72.14.203.104", "72.14.203.106", "72.14.203.107", "72.14.205.99", "72.14.205.104", "72.14.205.106", "72.14.205.107", "72.14.207.99", "72.14.207.104", "72.14.207.106", "72.14.207.107", "216.239.37.99", "216.239.37.104", "216.239.37.105", "216.239.37.106", "216.239.37.107", "216.239.37.147", "216.239.39.99", "216.239.39.104", "216.239.39.106", "216.239.39.107", "216.239.53.99", "216.239.53.104", "216.239.53.106", "216.239.53.107", "216.239.57.98", "216.239.57.99", "216.239.57.103", "216.239.57.104", "216.239.57.105", "216.239.57.106", "216.239.57.107", "216.239.57.147", "216.239.59.98", "216.239.59.99", "216.239.59.103", "216.239.59.104", "216.239.59.105", "216.239.59.106", "216.239.59.107", "216.239.59.147", "216.239.63.99", "216.239.63.104");</p>
<p>    $query = array("porn", "pills", "casino");</p>
<p>    $num = "100";</p>
<p>    for ($i=0; $i < count($query); $i++)<br />
    {<br />
        echo $query[$i]."\n";</p>
<p>        $numrank = array();</p>
<p>//        $serps = googleresults_multi($ips, $query[$i], $num);	  //for multithreaded</p>
<p>        for ($j=0; $j < count($ips); $j++)<br />
        {<br />
            $serps[$j] = googleresults($ips[$j], $query[$i], $num);	  //for single threaded</p>
<p>            $pos = rank($serps[$j], "www.mattcutts.com");</p>
<p>            if (empty($pos)){$pos = 0;}</p>
<p>            if (empty($numrank[$pos]))<br />
            {<br />
                $numrank[$pos] = 1;<br />
            }else{<br />
                $numrank[$pos] = $numrank[$pos]+1;<br />
            }<br />
        }</p>
<p>        ksort($numrank);</p>
<p>        foreach ($numrank as $key => $num)<br />
        {<br />
            echo $key." - ".$num."\n";<br />
        }<br />
    }</p>
<p>[/code]</p>
<p>Single threaded (default) is very very slow. If you change the comments as shown above to multithreaded, then it will be much faster. But be prepared to get your ip temporarily banned by Google.</p>
<p>[tags]Google datacenters, bigdaddy[/tags]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/google-datacenters.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ColorCode &#8211; WordPress Plugin to Highlight Code</title>
		<link>http://www.nott.org/blog/colorcode-wordpress-plugin-to-highlight-code.html</link>
		<comments>http://www.nott.org/blog/colorcode-wordpress-plugin-to-highlight-code.html#comments</comments>
		<pubDate>Mon, 02 Jan 2006 23:36:53 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=47</guid>
		<description><![CDATA[Spent a bit of time yesterday trying to get my code examples highlighted using the GeSHi class. However using the WordPress plugin code, I found that though the code was coming up fine it was removing all paragraphs and line breaks from any surrounding text. So I then wrote my own plugin still calling the [...]]]></description>
			<content:encoded><![CDATA[<p>Spent a bit of time yesterday trying to get my code examples highlighted using the <a href="http://qbnz.com/highlighter/">GeSHi</a> class. However using the <a href="http://dev.wp-plugins.org/wiki/GeshiSyntaxColorer">WordPress plugin</a> code, I found that though the code was coming up fine it was removing all paragraphs and line breaks from any surrounding text. So I then wrote my own plugin still calling the GeSHi class file, but where it displays both code and text properly.</p>
<p>It is still not perfect though, as it would be better to use pre for the code instead of all the spaces etc. Also need to switch to css instead of styles before <a href="http://www.w3.org/TR/xhtml2/">XHTML 2.0</a> arrives,  so will post an update once done.</p>
<p>Here is the plugin code:</p>
<p>[code lang="php"]</p>
<p>	/*<br />
	Plugin Name: ColorCode<br />
	Plugin URI: http://www.nott.org/colorcode.html<br />
	Description: A filter that highlights code using the GeSHi class for over 20 languages.<br />
	Version: 1.0<br />
	Author: Mike Nott<br />
	Author URI: http://www.nott.org<br />
	*/</p>
<p>	include(ABSPATH.'/wp-content/plugins/geshi.php');</p>
<p>	function cc_callback($code)<br />
	{<br />
		$geshi = new GeSHi($code[2], $code[1], ABSPATH.'/wp-content/plugins/geshi/');</p>
<p>		$geshi->set_header_type(GESHI_HEADER_DIV);</p>
<p>		$geshi->set_url_for_keyword_group(3, '');</p>
<p>		$newcode = $geshi->parse_code();</p>
<p>		return $newcode;<br />
	}</p>
<p>	function colorcode($content)<br />
	{<br />
		return preg_replace_callback("|<code lang=['\"]([a-zA-Z0-9_-]+)['\"]>(.*)< /code>|imsU", "cc_callback", $content);<br />
	}</p>
<p>	remove_filter('the_content', 'wptexturize');</p>
<p>	add_filter('the_content', 'colorcode', '1');<br />
	add_filter('the_excerpt', 'colorcode', '1');<br />
	add_filter('comment_text', 'colorcode', '1');</p>
<p>[/code]</p>
<p>[note: be sure to remove the space before the /code in the preg replace above before using]</p>
<p>Then just save this file as colorcode.php in your plugin folder, along with the <a href="http://dev.wp-plugins.org/wiki/GeshiSyntaxColorer">GeSHi files</a>.</p>
<p>Usage: </p>
<p>< code lang = " php "><br />
code goes here<br />
< / code ></p>
<p>(but again remove spaces) <img src='http://www.nott.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/colorcode-wordpress-plugin-to-highlight-code.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP Crawler</title>
		<link>http://www.nott.org/blog/php-crawler.html</link>
		<comments>http://www.nott.org/blog/php-crawler.html#comments</comments>
		<pubDate>Sat, 31 Dec 2005 16:39:20 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=46</guid>
		<description><![CDATA[For crawling in PHP I have always used the fantastic cURL. My curl single-threaded function: [code lang="php"] function singlethread_crawl($url) { $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"; $ch = curl_init(); curl_setopt($ch, CURLOPT_NOSIGNAL, 1); curl_setopt($ch, CURLOPT_NOPROGRESS, 1); curl_setopt($ch, CURLOPT_FAILONERROR, 1); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, $agent); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_MAXREDIRS, [...]]]></description>
			<content:encoded><![CDATA[<p>For crawling in <a href="http://www.php.net">PHP</a> I have always used the fantastic <a href="http://www.php.net/manual/en/ref.curl.php">cURL</a>.</p>
<p>My curl single-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function singlethread_crawl($url)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $ch = curl_init();</p>
<p>        curl_setopt($ch, CURLOPT_NOSIGNAL, 1);<br />
        curl_setopt($ch, CURLOPT_NOPROGRESS, 1);<br />
        curl_setopt($ch, CURLOPT_FAILONERROR, 1);<br />
        curl_setopt($ch, CURLOPT_URL, $url);<br />
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);<br />
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);<br />
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);<br />
        curl_setopt($ch, CURLOPT_MAXREDIRS, 1);<br />
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);<br />
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);</p>
<p>        $html = curl_exec($ch);</p>
<p>        curl_close ($ch);</p>
<p>        return $html;<br />
    }</p>
<p>[/code]</p>
<p>My curl multi-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function multithread_crawl($urls, $timeout, $verbose)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $mh = curl_multi_init();</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            $conn[$i] = curl_init($url);<br />
            curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOSIGNAL, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOPROGRESS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_FAILONERROR, 1);<br />
            curl_setopt($conn[$i], CURLOPT_URL, $url);<br />
            curl_setopt($conn[$i], CURLOPT_USERAGENT, $agent);<br />
            curl_setopt($conn[$i], CURLOPT_SSL_VERIFYPEER, 0);<br />
            curl_setopt($conn[$i], CURLOPT_FOLLOWLOCATION, 1);<br />
            curl_setopt($conn[$i], CURLOPT_MAXREDIRS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_TIMEOUT, $timeout);</p>
<p>            curl_multi_add_handle ($mh, $conn[$i]);<br />
        }</p>
<p>        do<br />
        {<br />
            $mrc = curl_multi_exec($mh, $active);<br />
        }<br />
        while ($mrc == CURLM_CALL_MULTI_PERFORM);</p>
<p>        while ($active and $mrc == CURLM_OK)<br />
        {<br />
            if (curl_multi_select($mh) != -1)<br />
            {<br />
                do<br />
                {<br />
                    $mrc = curl_multi_exec($mh, $active);<br />
                }<br />
                while ($mrc == CURLM_CALL_MULTI_PERFORM);<br />
            }<br />
        }</p>
<p>        if ($mrc != CURLM_OK)<br />
        {<br />
            print "Curl multi read error $mrc\n";<br />
        }</p>
<p>        $res = array();<br />
        $e = 0;</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            if (($err = curl_error($conn[$i])) == '')<br />
            {<br />
            	$res[$i]=curl_multi_getcontent($conn[$i]);<br />
            }<br />
            else<br />
            {<br />
                if ($verbose == "yes"){<br />
                    echo "error: ".$url." (".$err.")\n";<br />
                }else{<br />
                    $e++;<br />
                }<br />
            }</p>
<p>            curl_multi_remove_handle($mh,$conn[$i]);<br />
            curl_close($conn[$i]);<br />
        }</p>
<p>        curl_multi_close($mh);</p>
<p>        $s = count($urls)-$e;</p>
<p>        if ($verbose == "no"){<br />
            echo "errors ".$e." | success ".$s."\n";<br />
        }</p>
<p>        return $res;<br />
    }</p>
<p>[/code]</p>
<p>However there are some annoyances in curl &#8211; the main one for me being that you can&#8217;t pass variables to the write_function, </p>
<p>[code lang="php"]<br />
curl_setopt($conn[$i], CURLOPT_WRITEFUNCTION, myfunction);<br />
[/code]</p>
<p>which makes it useless for updating rows etc in a db (you can use <a href="http://www.php.net/curl_getinfo">curl_getinfo</a> to get the url so do a lookup &#8211; but that is pretty backwards). This means that the crawling is not even close to being truely multithreaded as you have to wait for all urls to finish before working with the data.</p>
<p>So I thought I&#8217;d have a go at writing the raw crawler myself using <a href="http://www.php.net/fsockopen">fsockopen</a>. Is not perfect as the multithread function does require the single thread one to follow any redirects.</p>
<p>My own single-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function mycrawler_single($url, $timeout=10, $maxredirs=1)<br />
    {<br />
        $urlinfo = parse_url($url);</p>
<p>        if (empty($urlinfo['scheme'])) {$urlinfo = parse_url('http://'.$url);}<br />
        if (empty($urlinfo["path"])) {$urlinfo["path"]="/";}</p>
<p>        if (empty($urlinfo['port']))<br />
        {<br />
			switch($urlinfo['scheme'])<br />
			{<br />
				case "http":<br />
					$urlinfo['port'] = 80;<br />
                    break;<br />
				case "https":<br />
					$urlinfo['port'] = 443;<br />
                    break;<br />
			}<br />
        }</p>
<p>        if (isset($urlinfo["query"]))<br />
        {<br />
            $request = "GET ".$urlinfo["path"]."?".$urlinfo["query"]." ";<br />
        } else {<br />
            $request = "GET ".$urlinfo["path"]." ";<br />
        }</p>
<p>        $request .= "HTTP/1.0\r\n";<br />
        $request .= "Host: ".$urlinfo['host']."\r\n";<br />
        $request .= "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\r\n";<br />
        $request .= "Connection: close\r\n\r\n";</p>
<p>        $fp = fsockopen($urlinfo['host'], $urlinfo['port'], $errno, $errstr, $timeout);</p>
<p>        if (!$fp)<br />
		{<br />
			echo "(".$errno.")".$errstr."\n";<br />
		}<br />
		else<br />
		{<br />
            fwrite($fp, $request);</p>
<p>            while (!feof($fp))<br />
            {<br />
                $data .= fgets($fp, 4096);<br />
            }</p>
<p>            fclose($fp);   </p>
<p>            $tmp = explode("\r\n\r\n", $data, 2);</p>
<p>            $urlinfo['header'] = $tmp[0];<br />
            $urlinfo['html'] = $tmp[1]; </p>
<p>            if ((stripos($urlinfo['header'], "location:")) &#038;&#038; ($maxredirs > 0))<br />
            {<br />
                preg_match("/\r\nlocation:(.*)/i", $urlinfo['header'], $match);</p>
<p>                if ($match)<br />
                {<br />
                    $redirect = trim($match[1]);</p>
<p>                    echo "Redirecting to ".$redirect."\n";</p>
<p>                    $maxredirs--;                         </p>
<p>                    return mycrawler_single($redirect, $timeout, $maxredirs);<br />
                }<br />
            }        </p>
<p>            return $urlinfo;<br />
		}<br />
    }</p>
<p>[/code]</p>
<p>My own multi-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function mycrawler_multi($urls, $timeout=10, $maxredirects=1)<br />
    {</p>
<p>        for ($i=0; $i<count($urls); $i++)<br />
        {<br />
            $urlinfo[$i] = parse_url($urls[$i]);<br />
            $maxredirs[$i] = $maxredirects;</p>
<p>            if (empty($urlinfo[$i]['scheme'])) {$urlinfo[$i] = parse_url('http://'.$url);}<br />
            if (empty($urlinfo[$i]["path"])) {$urlinfo[$i]["path"]="/";}</p>
<p>            if (empty($urlinfo[$i]['port']))<br />
            {<br />
			    switch($urlinfo[$i]['scheme'])<br />
			    {<br />
				    case "http":<br />
					    $urlinfo[$i]['port'] = 80;<br />
                        break;<br />
				    case "https":<br />
					    $urlinfo[$i]['port'] = 443;<br />
                        break;<br />
			    }<br />
            }</p>
<p>            if (isset($urlinfo[$i]["query"]))<br />
            {<br />
                $request[$i] = "GET ".$urlinfo[$i]["path"]."?".$urlinfo[$i]["query"]." ";<br />
            } else {<br />
                $request[$i] = "GET ".$urlinfo[$i]["path"]." ";<br />
            }</p>
<p>            $request[$i] .= "HTTP/1.0\r\n";<br />
            $request[$i] .= "Host: ".$urlinfo[$i]['host']."\r\n";<br />
            $request[$i] .= "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\r\n";<br />
            $request[$i] .= "Connection: close\r\n\r\n";</p>
<p>            $fp[$i] = fsockopen($urlinfo[$i]['host'], $urlinfo[$i]['port'], $urlinfo[$i]['errno'], $urlinfo[$i]['errstr'], $timeout);</p>
<p>            socket_set_blocking($fp[$i], false);</p>
<p>            if (!$fp[$i])<br />
		    {<br />
			    echo "(".$urlinfo[$i]['errno'].")".$urlinfo[$i]['errstr']."\n";<br />
		    }<br />
		    else<br />
		    {<br />
                fwrite($fp[$i], $request[$i]);<br />
            }<br />
        }</p>
<p>        $done = false;<br />
        $numdone = array();</p>
<p>        while (!$done)<br />
        {<br />
            for ($i=0; $i<count($urls); $i++)<br />
            {<br />
                if (!feof($fp[$i]))<br />
                {<br />
                    $data[$i] .= fgets($fp[$i], 4096);<br />
                }<br />
                elseif (empty($numdone[$i]))<br />
                {<br />
                    $numdone[$i] = 1;</p>
<p>                    $tmp[$i] = explode("\r\n\r\n", $data[$i], 2);</p>
<p>                    $urlinfo[$i]['header'] = $tmp[$i][0];<br />
                    $urlinfo[$i]['html'] = $tmp[$i][1]; </p>
<p>                    if ((stripos($urlinfo[$i]['header'], "location:")) &#038;&#038; ($maxredirs[$i] > 0))<br />
                    {<br />
                        preg_match("/\r\nlocation:(.*)/i", $urlinfo[$i]['header'], $match[$i]);</p>
<p>                        if ($match[$i])<br />
                        {<br />
                            $redirect[$i] = trim($match[$i][1]);</p>
<p>                            echo "Redirecting to ".$redirect[$i]."\n";</p>
<p>                            $maxredirs[$i]--;                         </p>
<p>                            $urlinfo[$i] = mycrawler_single($redirect[$i], $timeout, $maxredirs[$i]);<br />
                        }<br />
                    }<br />
                }<br />
            }</p>
<p>            $done = (array_sum($numdone) == count($urls));<br />
        }       </p>
<p>        for ($i=0; $i<count($urls); $i++)<br />
        {<br />
            fclose($fp[$i]);<br />
        }</p>
<p>        return $urlinfo;<br />
    }</p>
<p>[/code]</p>
<p>All require PHP5.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/php-crawler.html/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Google 2 RSS</title>
		<link>http://www.nott.org/blog/google-2-rss.html</link>
		<comments>http://www.nott.org/blog/google-2-rss.html#comments</comments>
		<pubDate>Thu, 29 Dec 2005 16:06:11 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=44</guid>
		<description><![CDATA[On ThreadWatch yesterday there was a thread about rank checkers, and I couldn&#8217;t believe that some SEOs don&#8217;t use them. We use our own heavy duty mega serp scraper to fully analyse any industry we are working in. Anyway, Graywolf mentioned how he would love a Google RSS or XML feed &#8211; I having been [...]]]></description>
			<content:encoded><![CDATA[<p>On <a href="http://www.threadwatch.org/">ThreadWatch</a> yesterday there was a <a href="http://www.threadwatch.org/node/5140">thread</a> about rank checkers, and I couldn&#8217;t believe that some SEOs don&#8217;t use them. We use our own heavy duty mega serp scraper to fully analyse any industry we are working in. Anyway, <a href="http://www.wolf-howl.com/">Graywolf</a> mentioned how he would love a Google RSS or XML feed &#8211; I having been waiting for this for a long time, as their SERPs are so dirty it would make things a bit easier. And to only offer 10 results per page in their <a href="http://www.google.com/apis/">API</a> is shocking!! Come on Goo, catch up with MSN + Yahoo.</p>
<p>Anyway, I got a bit bored today and knocked up a quick Google2RSS php script for those who are without (being xmas season)</p>
<p>Warning &#8211; this is very quick, dirty +  crude code (in other words &#8211; not the best) <img src='http://www.nott.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>[requires <a href="http://www.php.net/">PHP5</a> + <a href="http://www.php.net/curl">cURL</a>]</p>
<p>[code lang="php"]</p>
<p>	header("Content-type: text/xml\n");</p>
<p>	echo google2rss("spam", 10);</p>
<p>	function google2rss($query, $numres)<br />
	{<br />
        $ch = curl_init();</p>
<p>        curl_setopt($ch, CURLOPT_URL, "http://www.google.com/search?q=".$query."&#038;num=".$numres."&#038;hl=en&#038;safe=off");<br />
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");<br />
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);</p>
<p>        $html = curl_exec($ch);</p>
<p>        curl_close ($ch);</p>
<p>		$html = str_replace("\r\n", " ", $html);<br />
		$html = str_replace("
<p class=g>", "\n
<p class=g>", $html);<br />
		$html = str_replace("</div>
<p>", "</p></div>
<p>\n", $html);<br />
		$html = str_replace("View as HTML</a>", "", $html);</p>
<p>		preg_match_all("/
<p class=g>(.*)<a class=l href=\"(.*)\" (.*)\">(.*)<\/a>(.*)<br /><font/i", $html, $matches);</p>
<p>		$items['url'] = $matches[2];</p>
<p>		for ($i=0; $i < count($items['url']); $i++)<br />
		{<br />
			$items['title'][$i] = strip_tags($matches[4][$i]);<br />
			$items['title'][$i] = str_replace(" - [ Translate this page", "", $items['title'][$i]);<br />
			$items['desc'][$i] = strip_tags($matches[5][$i]);<br />
			$items['desc'][$i] = preg_replace("/^ ]/i", "", $items['desc'][$i]);<br />
		}</p>
<p>		$rss = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";<br />
		$rss .= "<rss version=\"2.0\">\n";<br />
		$rss .= "<channel>\n";<br />
		$rss .= "\t\n";<br />
		$rss .= "\t
<link>http://www.google.com/search?q=".$query."&amp;num=".$numres."&amp;hl=en&amp;safe=off</link>\n";<br />
		$rss .= "\t<description>".$query." - Google RSS search results</description>\n";<br />
		$rss .= "\t
<pubDate>".date(DATE_RFC822)."</pubDate>\n";<br />
		$rss .= "\t<generator>Mike Nott - http://www.nott.org</generator>\n";<br />
		$rss .= "\t<language>en</language>\n";</p>
<p>		for ($i=0; $i < count($items['url']); $i++)<br />
		{<br />
			$rss .= "\t<item>\n";<br />
			$rss .= "\t\t\n";<br />
			$rss .= "\t\t
<link>".$items['url'][$i]."</link>\n";<br />
			$rss .= "\t\t<description>".htmlspecialchars($items['desc'][$i])."</description>\n";<br />
			$rss .= "\t\t
<pubDate>".date(DATE_RFC822)."</pubDate>\n";<br />
			$rss .= "\t</item>\n";<br />
		}</p>
<p>		$rss .= "</channel>\n";<br />
		$rss .= "</rss>";</p>
<p>		return $rss;<br />
	}</p>
<p>[/code]</p>
<p>If anyone who is actually good at RegEx would like to improve the code, please do. <img src='http://www.nott.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/google-2-rss.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>View Adsense Source Code</title>
		<link>http://www.nott.org/blog/view-adsense-source-code.html</link>
		<comments>http://www.nott.org/blog/view-adsense-source-code.html#comments</comments>
		<pubDate>Sat, 10 Dec 2005 19:20:44 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Web/Net]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=39</guid>
		<description><![CDATA[I can be quite slow sometimes I only just realised that you could view the source code (html) of the Adsense javascript include in Firefox. Just right-click on the ads, choose &#8216;This Frame&#8217; then &#8216;View Frame Source&#8217; (obivously I knew this for normal frames ):]]></description>
			<content:encoded><![CDATA[<p>I can be quite slow sometimes <img src='http://www.nott.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I only just realised that you could view the source code (<a href="http://www.w3.org/MarkUp/">html</a>) of the <a href="https://www.google.com/adsense/">Adsense</a> javascript include in <a href="http://www.mozilla.com/firefox/">Firefox</a>.</p>
<p>Just right-click on the ads, choose &#8216;This Frame&#8217; then &#8216;View Frame Source&#8217; (obivously I knew this for normal frames <img src='http://www.nott.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> ):</p>
<p><img src='/uploads/adsensesource.jpg' alt='' /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/view-adsense-source-code.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>IP 2 Country</title>
		<link>http://www.nott.org/blog/ip-2-country.html</link>
		<comments>http://www.nott.org/blog/ip-2-country.html#comments</comments>
		<pubDate>Wed, 07 Dec 2005 08:31:47 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Web/Net]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=30</guid>
		<description><![CDATA[Glenn asked about this in our private SEO forum, but thought I&#8217;d share it here too. I wrote this dirty shell script a couple of years ago for Wotbox, but I think it should still work. Just set it to a cron job once a day and it will give you a nice IP to [...]]]></description>
			<content:encoded><![CDATA[<p>Glenn asked about this in our private SEO forum, but thought I&#8217;d share it here too. I wrote this dirty shell script a couple of years ago for <a href="http://www.wotbox.com">Wotbox</a>, but I think it should still work. Just set it to a cron job once a day and it will give you a nice IP to country database.</p>
<p>[code lang="bash"]<br />
thedate=`date --date=yesterday +%Y%m%d`<br />
workdir=/scripts/ipdb<br />
savfile=$workdir/ipdb.$thedate</p>
<p>dbserver=127.0.0.1<br />
database=ipdb<br />
dbuserpass="--user=user --password=pass"</p>
<p>cd $workdir</p>
<p>echo "Downloading ARIN IP database"<br />
wget -q "ftp://ftp.arin.net/pub/stats/arin/delegated-arin-$thedate"<br />
echo "Downloading APNIC IP database"<br />
wget -q "ftp://ftp.arin.net/pub/stats/apnic/delegated-apnic-$thedate"<br />
echo "Downloading LACNIC IP database"<br />
wget -q "ftp://ftp.arin.net/pub/stats/lacnic/delegated-lacnic-$thedate"<br />
echo "Downloading RIPE IP database"<br />
wget -q "ftp://ftp.arin.net/pub/stats/ripencc/delegated-ripencc-$thedate"</p>
<p>echo "Cleaning downloaded IP databases"<br />
tail -100000 $workdir/delegated-arin-$thedate | grep "ipv4" | grep -v "*" | grep -v "-" | sed 's/|assigned//' | sed 's/|allocated//' | sed 's/|ipv4//' | sed 's/|........$//' > $savfile<br />
tail -100000 $workdir/delegated-apnic-$thedate | grep "ipv4" | grep -v "#" | grep -v "*" | grep -v "+" | sed 's/|assigned//' | sed 's/|allocated//' | sed 's/|ipv4//' | sed 's/|........$//' >> $savfile<br />
tail -100000 $workdir/delegated-lacnic-$thedate | grep "ipv4" | grep -v "*" | grep -v "-" | sed 's/|assigned//' | sed 's/|allocated//' | sed 's/|ipv4//' | sed 's/|........$//' >> $savfile<br />
tail -100000 $workdir/delegated-ripencc-$thedate | grep "ipv4" | grep -v "*" | grep -v "+" | sed 's/|assigned//' | sed 's/|allocated//' | sed 's/|ipv4//' | sed 's/|........$//' >> $savfile</p>
<p>echo "Removing downloaded IP databases"<br />
rm delegated-arin-$thedate<br />
rm delegated-apnic-$thedate<br />
rm delegated-lacnic-$thedate<br />
rm delegated-ripencc-$thedate</p>
<p>query1="CREATE DATABASE IF NOT EXISTS $database"<br />
query2="DROP TABLE IF EXISTS tblips2;"<br />
query3="CREATE TABLE tblips2 (ID int(5) unsigned NOT NULL auto_increment,NIC varchar(7) default NULL,Country char(2) default NULL,StartIPreal varchar(15) default NULL,StartIPint int(20) unsigned default NULL,Subnetreal varchar(15) default NULL,Subnetint int(20) unsigned default NULL,EndIPreal varchar(15) default NULL,EndIPint int(20) unsigned default NULL,PRIMARY KEY  (ID)) TYPE=MyISAM;"<br />
query4="LOAD DATA LOCAL INFILE '$savfile' INTO TABLE $database.tblips2 FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' (NIC,Country,StartIPreal,Subnetint);"<br />
query5="DROP TABLE IF EXISTS tblips;"<br />
query6="CREATE TABLE tblips (ID int(5) unsigned NOT NULL auto_increment,NIC varchar(7) default NULL,Country char(2) default NULL,StartIPreal varchar(15) default NULL,StartIPint int(20) unsigned default NULL,Subnetreal varchar(15) default NULL,Subnetint int(20) unsigned default NULL,EndIPreal varchar(15) default NULL,EndIPint int(20) unsigned default NULL,PRIMARY KEY  (ID)) TYPE=MyISAM;"<br />
query7="INSERT INTO tblips (NIC,Country,StartIPReal,StartIPInt,SubnetReal,SubnetInt,EndIPInt,EndIPReal) SELECT tblips2.NIC,tblips2.Country,tblips2.StartIPReal,inet_aton(tblips2.StartIPReal) as StartIPInt,inet_ntoa(tblips2.SubnetInt) as SubnetReal,tblips2.SubnetInt,inet_aton(tblips2.StartIPReal)+tblips2.SubnetInt as EndIPInt,inet_ntoa(inet_aton(tblips2.StartIPReal)+tblips2.SubnetInt) as EndIPReal FROM tblips2;"<br />
query8="DROP TABLE IF EXISTS tblips2;"</p>
<p>echo "Importing IP databases to MySql"</p>
<p>echo "MySql Query 1"<br />
mysql --host=$dbserver $dbuserpass -e "$query1"<br />
echo "MySql Query 2"<br />
mysql --host=$dbserver $dbuserpass $database -e "$query2"<br />
echo "MySql Query 3"<br />
mysql --host=$dbserver $dbuserpass $database -e "$query3"<br />
echo "MySql Query 4"<br />
mysql --host=$dbserver $dbuserpass $database -e "$query4"<br />
echo "MySql Query 5"<br />
mysql --host=$dbserver $dbuserpass $database -e "$query5"<br />
echo "MySql Query 6"<br />
mysql --host=$dbserver $dbuserpass $database -e "$query6"<br />
echo "MySql Query 7"<br />
mysql --host=$dbserver $dbuserpass $database -e "$query7"<br />
echo "MySql Query 8"<br />
mysql --host=$dbserver $dbuserpass $database -e "$query8"</p>
<p>rm $savfile</p>
<p>echo "Process completed"<br />
[/code]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/ip-2-country.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mobile SEO &amp; Google Finally Validates</title>
		<link>http://www.nott.org/blog/mobile-seo-google-finally-validates.html</link>
		<comments>http://www.nott.org/blog/mobile-seo-google-finally-validates.html#comments</comments>
		<pubDate>Sun, 04 Dec 2005 21:50:49 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=28</guid>
		<description><![CDATA[Google, who for some reason choose NOT to bother making their site and SERPS fully standards compliant, finally have a results serlvet that does validate &#8211; http://www.google.com/xhtml From the DocType it can be seen that this is intended for mobile platforms: [code lang="html"] [/code] Also selecting Mobile Web (Beta) shows some interesting results. Could be [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.google.com">Google</a>, who for some reason choose <a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com%2Fsearch%3Fnum%3D100%26q%3Dgoogle%2Bdoesn%25E2%2580%2599t%2Bvalidate">NOT</a> to bother making their site and SERPS fully standards compliant, finally have a results serlvet that <a href="http://www.google.com/xhtml?q=finally">does validate</a> &#8211; <a href="http://www.google.com/xhtml">http://www.google.com/xhtml</a></p>
<p>From the DocType it can be seen that this is intended for mobile platforms:</p>
<p>[code lang="html"]<br />
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd"><br />
[/code]</p>
<p>Also selecting <a href="http://www.google.com/xhtml/help">Mobile Web (Beta)</a> shows some interesting results. Could be time to starting thinking about <a href="http://www.mobilesearchmarketing.com/">Mobile SEO</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/mobile-seo-google-finally-validates.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WordPress Install</title>
		<link>http://www.nott.org/blog/wordpress-install.html</link>
		<comments>http://www.nott.org/blog/wordpress-install.html#comments</comments>
		<pubDate>Thu, 24 Nov 2005 17:28:32 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=15</guid>
		<description><![CDATA[I decided to use WordPress for this blog as it seems to be the current &#8216;standard&#8217;. It was nice to se that it works nicely out of the box after the short install. However, being as awkward as I am, I decided that that I wanted to tweak it a bit: Firstly I didn&#8217;t want [...]]]></description>
			<content:encoded><![CDATA[<p>I decided to use <a href="http://wordpress.org">WordPress</a> for this blog as it seems to be the current &#8216;standard&#8217;. It was nice to se that it works nicely out of the box after the short install. However, being as awkward as I am, I decided that that I wanted to tweak it a bit:</p>
<p>Firstly I didn&#8217;t want to have a dynamic url system, so opted to use WP&#8217;s &#8216;permalinks&#8217; setup. But again I didn&#8217;t want to use their standard folder architecture that 1,000s of others also use e.g. blog.com/archive/2005/11/post-title.html. So I decided to use the following structure:</p>
<ul>
<li>www.nott.org
<ul>
<li>/home-page.html</li>
<li>/content-page-title.html</li>
<li>/blog/</li>
<li>/blog/post-title.html</li>
<li>/blog/category/</li>
<li>/blog/2005/11/     (archive dates)</li>
<li>/blog/feed/feed-format.xml</li>
<li>/blog/feed/category/feed-format.xml  (category feed)</li>
<li>/blog/feed/post-title.xml  (comments feed)</li>
</ul>
</li>
</ul>
<p>To do this took a few mod_rewrite lines in Apache:</p>
<p>[code lang="apache"]<br />
Options +FollowSymLinks<br />
RewriteEngine On<br />
RewriteRule ^/blog/$ / [R=301]<br />
RewriteCond %{REQUEST_URI} !^/index\.html$<br />
RewriteCond %{REQUEST_URI} !^/index\.php$<br />
RewriteRule ^/([A-Za-z0-9-_]+).html$ /index.php?pagename=$1 [QSA]<br />
RewriteRule ^/blog/([A-Za-z0-9-_]+).html$ /index.php?name=$1 [QSA]<br />
RewriteRule ^/blog/([0-9]+)/([0-9]+)/$ /index.php?m=$1$2 [QSA]<br />
RewriteRule ^/blog/feed/(rdf|rss|rss2|atom).xml$ /wp-feed.php?feed=$1 [QSA]<br />
RewriteRule ^/blog/feed/comments/([A-Za-z0-9-_]+).xml$ /wp-feed.php?feed=rss2&#038;name=$1 [QSA]<br />
RewriteRule ^/blog/([A-Za-z0-9-_]+)/$ /index.php?category_name=$1 [QSA]<br />
RewriteRule ^/blog/feed/([A-Za-z0-9-_]+)/(rdf|rss|rss2|atom).xml$ /wp-feed.php?category_name=$1&#038;feed=$2 [QSA]<br />
[/code]</p>
<p>Then some small changes to some of the wordpress php functions (i know this will make upgrades tricky, but if I know what changes I&#8217;ve made, it should be possible):</p>
<p><strong>/wp-includes/feed-functions.php</strong></p>
<p>[code lang="php"]<br />
function comments_rss($commentsrssfilename = '') {<br />
	global $id;</p>
<p>	if ('' != get_settings('permalink_structure'))<br />
		$url = str_replace(".html",".xml",str_replace("/blog/", "/blog/feed/comments/", get_permalink()));<br />
	else<br />
		$url = get_settings('home') . "/$commentsrssfilename?feed=rss2&amp;p=$id";</p>
<p>	return apply_filters('post_comments_feed_link', $url);<br />
}<br />
[/code]</p>
<p><strong>/wp-includes/template-functions-links.php</strong></p>
<p>[code lang="php"]<br />
function get_feed_link($feed='rss2') {<br />
	global $wp_rewrite;<br />
	$do_perma = 0;<br />
	$feed_url = get_settings('siteurl');<br />
	$comment_feed_url = $feed_url;</p>
<p>	$permalink = $wp_rewrite->get_feed_permastruct();<br />
	if ('' != $permalink) {<br />
		if ( false !== strpos($feed, 'comments_') ) {<br />
			$feed = str_replace('comments_', '', $feed);<br />
			$permalink = $wp_rewrite->get_comment_feed_permastruct();<br />
		}</p>
<p>		$permalink = str_replace('%feed%', $feed, $permalink);<br />
		$permalink = preg_replace('#/+#', '/', "/$permalink/");<br />
		$output = get_settings('home') . "/blog/feed/" . $feed . ".xml";<br />
	} else {<br />
		if ( false !== strpos($feed, 'comments_') )<br />
			$feed = str_replace('comments_', 'comments-', $feed);</p>
<p>		$output = get_settings('home') . "/?feed={$feed}";<br />
	}</p>
<p>	return apply_filters('feed_link', $output, $feed);<br />
}<br />
[/code]</p>
<p><strong>template-functions-post.php</strong></p>
<p>[code lang="php"]<br />
function wp_list_pages($args = '') {<br />
	parse_str($args, $r);<br />
	if ( !isset($r['depth']) ) $r['depth'] = 0;<br />
	if ( !isset($r['show_date']) ) $r['show_date'] = '';<br />
	if ( !isset($r['child_of']) ) $r['child_of'] = 0;<br />
	if ( !isset($r['title_li']) ) $r['title_li'] = __('Pages');<br />
	if ( !isset($r['echo']) ) $r['echo'] = 1;</p>
<p>	$output = '';</p>
<p>	// Query pages.<br />
	$pages = &#038; get_pages($args);<br />
	if ( $pages ) :</p>
<p>	if ( $r['title_li'] )<br />
		$output .= '
<li class="pagenav">' . $r['title_li'] . '
<ul>';<br />
	// Now loop over all pages that were selected<br />
	$page_tree = Array();<br />
	foreach($pages as $page) {<br />
		// set the title for the current page<br />
		$page_tree[$page->ID]['title'] = $page->post_title;<br />
		$page_tree[$page->ID]['name'] = $page->post_name;</p>
<p>		// set the selected date for the current page<br />
		// depending on the query arguments this is either<br />
		// the createtion date or the modification date<br />
		// as a unix timestamp. It will also always be in the<br />
		// ts field.<br />
		if (! empty($r['show_date'])) {<br />
			if ('modified' == $r['show_date'])<br />
				$page_tree[$page->ID]['ts'] = $page->post_modified;<br />
			else<br />
				$page_tree[$page->ID]['ts'] = $page->post_date;<br />
		}</p>
<p>		// The tricky bit!!<br />
		// Using the parent ID of the current page as the<br />
		// array index we set the curent page as a child of that page.<br />
		// We can now start looping over the $page_tree array<br />
		// with any ID which will output the page links from that ID downwards.<br />
		if ( $page->post_parent != $page->ID)<br />
			$page_tree[$page->post_parent]['children'][] = $page->ID;<br />
	}<br />
	// Output of the pages starting with child_of as the root ID.<br />
	// child_of defaults to 0 if not supplied in the query.<br />
	$output .= _page_level_out($r['child_of'],$page_tree, $r, 0, false);<br />
	if ( $r['title_li'] )<br />
		$output .= '</ul>
</li>
<p>';<br />
	endif;</p>
<p>	$output = apply_filters('wp_list_pages', $output);</p>
<p>	if ( $r['echo'] )<br />
		echo str_replace('/"','.html"',$output);<br />
	else<br />
		return $output;<br />
}<br />
[/code]</p>
<p>and</p>
<p>[code lang="php"]<br />
function _page_level_out($parent, $page_tree, $args, $depth = 0, $echo = true) {<br />
//	global $wp_query;</p>
<p>//	$queried_obj = $wp_query->get_queried_object();</p>
<p>	$output = '';</p>
<p>	if($depth)<br />
		$indent = str_repeat("\t", $depth);<br />
	//$indent = join('', array_fill(0,$depth,"\t"));</p>
<p>	foreach($page_tree[$parent]['children'] as $page_id) {<br />
		$cur_page = $page_tree[$page_id];<br />
		$title = $cur_page['title'];</p>
<p>		$css_class = 'page_item';<br />
		if( $page_id == $queried_obj->ID) {<br />
			$css_class .= ' current_page_item';<br />
		}</p>
<p>		$output .= $indent . '
<li class="' . $css_class . '"><a href="' . get_page_link($page_id) . '" title="' . wp_specialchars($title) . '">' . $title . '</a>';</p>
<p>		if(isset($cur_page['ts'])) {<br />
			$format = get_settings('date_format');<br />
			if(isset($args['date_format']))<br />
				$format = $args['date_format'];<br />
			$output .= " " . mysql2date($format, $cur_page['ts']);<br />
		}<br />
		echo "\n";</p>
<p>		if(isset($cur_page['children']) &#038;&#038; is_array($cur_page['children'])) {<br />
			$new_depth = $depth + 1;</p>
<p>			if(!$args['depth'] || $depth < ($args['depth']-1)) {<br />
				$output .= "$indent
<ul>\n";<br />
				$output .= _page_level_out($page_id, $page_tree, $args, $new_depth, false);<br />
				$output .= "$indent\n";<br />
			}<br />
		}<br />
		$output .= "$indent</li>
<p>\n";<br />
	}<br />
	if ( $echo )<br />
		echo $output;<br />
	else<br />
		return $output;<br />
}<br />
[/code]</p>
<p><strong>template-functions-general.php</strong> (to put blog description in homepage title)</p>
<p>[code lang="php"]<br />
function wp_title($sep = '&raquo;', $display = true) {<br />
    global $wpdb;<br />
    global $m, $year, $monthnum, $day, $category_name, $month, $posts;</p>
<p>		$cat = get_query_var('cat');<br />
		$p = get_query_var('p');<br />
		$name = get_query_var('name');<br />
		$category_name = get_query_var('category_name');</p>
<p>    // If there's a category<br />
    if(!empty($cat)) {<br />
        if (!stristr($cat,'-')) { // category excluded<br />
            $title = get_the_category_by_ID($cat);<br />
        }<br />
    }<br />
    if (!empty($category_name)) {<br />
        if (stristr($category_name,'/')) {<br />
            $category_name = explode('/',$category_name);<br />
            if ($category_name[count($category_name)-1]) {<br />
                $category_name = $category_name[count($category_name)-1]; // no trailing slash<br />
            } else {<br />
                $category_name = $category_name[count($category_name)-2]; // there was a trailling slash<br />
            }<br />
        }<br />
        $title = $wpdb->get_var("SELECT cat_name FROM $wpdb->categories WHERE category_nicename = '$category_name'");<br />
    }</p>
<p>    // If there's a month<br />
    if(!empty($m)) {<br />
        $my_year = substr($m, 0, 4);<br />
        $my_month = $month[substr($m, 4, 2)];<br />
        $title = "$my_year $sep $my_month";</p>
<p>    }<br />
    if (!empty($year)) {<br />
        $title = $year;<br />
        if (!empty($monthnum)) {<br />
            $title .= " $sep ".$month[zeroise($monthnum, 2)];<br />
        }<br />
        if (!empty($day)) {<br />
            $title .= " $sep ".zeroise($day, 2);<br />
        }<br />
    }</p>
<p>    // If there's a post<br />
    if (is_single() || is_page()) {<br />
        $title = strip_tags($posts[0]->post_title);<br />
        $title = apply_filters('single_post_title', $title);<br />
    }</p>
<p>    // Send it out<br />
    if ($display &#038;&#038; isset($title)) {<br />
        echo " $sep $title";<br />
    } elseif (!$display &#038;&#038; isset($title)) {<br />
        return " $sep $title";<br />
    } elseif ($display){<br />
    	echo " $sep ";<br />
    	bloginfo('description');<br />
    }<br />
}<br />
[/code]</p>
<p>I still haven&#8217;t gotten round to putting correct titles in the feeds, will post the fuctoin changes when done.</p>
<p>The excellent template I used as a base is <a href="http://www.perun.net/">Red Train 1.0</a> by <a href="http://www.vlad-design.de/">Vladimir Simovic</a> (so a big thanks to Vlad). </p>
<p>I then changed it a bit, here are my blog templates:</p>
<p><a href="http://www.nott.org/wp-content/themes/nott/style.css">style.css</a><br />
<a href="http://www.nott.org/wp-content/themes/nott/index.php.txt">index.php</a><br />
<a href="http://www.nott.org/wp-content/themes/nott/page.php.txt">page.php</a><br />
<a href="http://www.nott.org/wp-content/themes/nott/comments.php.txt">comments.php</a></p>
<p>I also used <a href="http://www.asymptomatic.net">Owen Winkler</a>&#8216;s <a href="http://redalt.com/downloads/wp/codefilter.zip">Code Filter</a> plugin to properly display code, but had to slightly change it to:</p>
<p>[code lang="php"]<br />
function cf_callback($stuff)<br />
{<br />
	return "<br />
<blockquote{$stuff[1]}>".htmlspecialchars(clean_pre($stuff[2]), ENT_NOQUOTES)."</blockquote >";<br />
}</p>
<p>function cf_encode($content)<br />
{<br />
	return preg_replace_callback('|<br />
<blockquote([^>]*)>(.*)</blockquote >|imsU', 'cf_callback', $content);<br />
}</p>
<p>add_filter('the_content', 'cf_encode', '1');<br />
[/code]</p>
<p>Hope this saves someone the couple of days it has taken me to get <a href="http://www.wordpress.org">WordPress</a> set up properly to my liking <img src='http://www.nott.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/wordpress-install.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

