<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mike Nott &#187; PHP</title>
	<atom:link href="http://www.nott.org/blog/category/php/feed" rel="self" type="application/rss+xml" />
	<link>http://www.nott.org</link>
	<description>SEO, Music, Photography &#38; Other Stuff</description>
	<lastBuildDate>Thu, 26 Jan 2012 09:18:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>PHP Whois Script</title>
		<link>http://www.nott.org/blog/php-whois-script.html</link>
		<comments>http://www.nott.org/blog/php-whois-script.html#comments</comments>
		<pubDate>Wed, 25 Jan 2006 21:43:04 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web/Net]]></category>

		<guid isPermaLink="false">http://www2.nott.org/blog/php-whois-script.html</guid>
		<description><![CDATA[Glenn over at SSEO asked for a script to do mass whois lookups. Use this function: [code lang="php"] function getwhois($domain, $tld) { require_once("whois.class.php"); $whois = new Whois(); if( !$whois->ValidDomain($domain.'.'.$tld) ){ return 'Sorry, the domain is not valid or not supported.'; } if( $whois->Lookup($domain.'.'.$tld) ) { return $whois->GetData(1); }else{ return 'Sorry, an error occurred.'; } } [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.glenn.ca/">Glenn</a> over at <a href="http://its-a-secret">SSEO</a> asked for a script to do mass whois lookups. </p>
<p>Use this function:</p>
<p>[code lang="php"]</p>
<p>    function getwhois($domain, $tld)<br />
    {<br />
        require_once("whois.class.php");</p>
<p>        $whois = new Whois();</p>
<p>	    if( !$whois->ValidDomain($domain.'.'.$tld) ){<br />
		    return 'Sorry, the domain is not valid or not supported.';<br />
	    }</p>
<p>        if( $whois->Lookup($domain.'.'.$tld) )<br />
        {<br />
            return $whois->GetData(1);<br />
        }else{<br />
            return 'Sorry, an error occurred.';<br />
        }<br />
    }</p>
<p>	$domain = trim($_REQUEST['domain']);</p>
<p>	$dot = strpos($domain, '.');<br />
	$sld = substr($domain, 0, $dot);<br />
	$tld = substr($domain, $dot+1);                     </p>
<p>	$whois = getwhois($sld, $tld);</p>
<p>	echo "
<pre>";
	echo $whois;
	echo "</pre>
<p>";   </p>
<p>[/code]</p>
<p>To call this <a href="/uploads/whois.class.php.txt">class</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/php-whois-script.html/feed</wfw:commentRss>
		<slash:comments>33</slash:comments>
		</item>
		<item>
		<title>Google Datacenters</title>
		<link>http://www.nott.org/blog/google-datacenters.html</link>
		<comments>http://www.nott.org/blog/google-datacenters.html#comments</comments>
		<pubDate>Thu, 05 Jan 2006 11:57:29 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=50</guid>
		<description><![CDATA[For all those wanting to check the rollout of the BigDaddy update, here&#8217;s a list of 116 Google Datacenters: 64.233.161.98 64.233.161.99 64.233.161.104 64.233.161.105 64.233.161.106 64.233.161.107 64.233.161.147 64.233.163.99 64.233.163.104 64.233.167.99 64.233.167.104 64.233.167.106 64.233.167.107 64.233.167.147 64.233.171.99 64.233.171.103 64.233.171.104 64.233.171.105 64.233.171.106 64.233.171.107 64.233.171.147 64.233.179.99 64.233.179.104 64.233.179.106 64.233.179.107 64.233.183.99 64.233.183.104 64.233.183.107 64.233.185.99 64.233.185.104 64.233.185.106 64.233.185.107 64.233.185.147 64.233.187.99 64.233.187.104 64.233.187.106 [...]]]></description>
			<content:encoded><![CDATA[<p>For all those wanting to check the rollout of the <a href="http://www.mattcutts.com/blog/bigdaddy-on-the-move/">BigDaddy</a> update, here&#8217;s a list of 116 Google Datacenters:</p>
<blockquote><p>
<a href="http://64.233.161.98/">64.233.161.98</a> <a href="http://64.233.161.99/">64.233.161.99</a> <a href="http://64.233.161.104/">64.233.161.104</a> <a href="http://64.233.161.105/">64.233.161.105</a> <a href="http://64.233.161.106/">64.233.161.106</a> <a href="http://64.233.161.107/">64.233.161.107</a> <a href="http://64.233.161.147/">64.233.161.147</a> <a href="http://64.233.163.99/">64.233.163.99</a> <a href="http://64.233.163.104/">64.233.163.104</a> <a href="http://64.233.167.99/">64.233.167.99</a> <a href="http://64.233.167.104/">64.233.167.104</a> <a href="http://64.233.167.106/">64.233.167.106</a> <a href="http://64.233.167.107/">64.233.167.107</a> <a href="http://64.233.167.147/">64.233.167.147</a> <a href="http://64.233.171.99/">64.233.171.99</a> <a href="http://64.233.171.103/">64.233.171.103</a> <a href="http://64.233.171.104/">64.233.171.104</a> <a href="http://64.233.171.105/">64.233.171.105</a> <a href="http://64.233.171.106/">64.233.171.106</a> <a href="http://64.233.171.107/">64.233.171.107</a> <a href="http://64.233.171.147/">64.233.171.147</a> <a href="http://64.233.179.99/">64.233.179.99</a> <a href="http://64.233.179.104/">64.233.179.104</a> <a href="http://64.233.179.106/">64.233.179.106</a> <a href="http://64.233.179.107/">64.233.179.107</a> <a href="http://64.233.183.99/">64.233.183.99</a> <a href="http://64.233.183.104/">64.233.183.104</a> <a href="http://64.233.183.107/">64.233.183.107</a> <a href="http://64.233.185.99/">64.233.185.99</a> <a href="http://64.233.185.104/">64.233.185.104</a> <a href="http://64.233.185.106/">64.233.185.106</a> <a href="http://64.233.185.107/">64.233.185.107</a> <a href="http://64.233.185.147/">64.233.185.147</a> <a href="http://64.233.187.99/">64.233.187.99</a> <a href="http://64.233.187.104/">64.233.187.104</a> <a href="http://64.233.187.106/">64.233.187.106</a> <a href="http://64.233.187.107/">64.233.187.107</a> <a href="http://64.233.189.104/">64.233.189.104</a> <a href="http://66.102.7.98/">66.102.7.98</a> <a href="http://66.102.7.99/">66.102.7.99</a> <a href="http://66.102.7.104/">66.102.7.104</a> <a href="http://66.102.7.105/">66.102.7.105</a> <a href="http://66.102.7.106/">66.102.7.106</a> <a href="http://66.102.7.107/">66.102.7.107</a> <a href="http://66.102.7.147/">66.102.7.147</a> <a href="http://66.102.9.99/">66.102.9.99</a> <a href="http://66.102.9.104/">66.102.9.104</a> <a href="http://66.102.9.106/">66.102.9.106</a> <a href="http://66.102.9.107/">66.102.9.107</a> <a href="http://66.102.9.147/">66.102.9.147</a> <a href="http://66.102.11.99/">66.102.11.99</a> <a href="http://66.102.11.104/">66.102.11.104</a> <a href="http://66.102.11.106/">66.102.11.106</a> <a href="http://66.102.11.107/">66.102.11.107</a> <a href="http://66.249.81.99/">66.249.81.99</a> <a href="http://66.249.81.104/">66.249.81.104</a> <a href="http://66.249.81.106/">66.249.81.106</a> <a href="http://66.249.81.107/">66.249.81.107</a> <a href="http://66.249.83.99/">66.249.83.99</a> <a href="http://66.249.83.104/">66.249.83.104</a> <a href="http://66.249.83.106/">66.249.83.106</a> <a href="http://66.249.83.107/">66.249.83.107</a> <a href="http://66.249.85.99/">66.249.85.99</a> <a href="http://66.249.85.104/">66.249.85.104</a> <a href="http://66.249.85.106/">66.249.85.106</a> <a href="http://66.249.85.107/">66.249.85.107</a> <a href="http://66.249.87.99/">66.249.87.99</a> <a href="http://66.249.87.104/">66.249.87.104</a> <a href="http://66.249.89.99/">66.249.89.99</a> <a href="http://66.249.89.104/">66.249.89.104</a> <a href="http://66.249.89.106/">66.249.89.106</a> <a href="http://66.249.89.107/">66.249.89.107</a> <a href="http://66.249.93.99/">66.249.93.99</a> <a href="http://66.249.93.104/">66.249.93.104</a> <a href="http://66.249.93.106/">66.249.93.106</a> <a href="http://66.249.93.107/">66.249.93.107</a> <a href="http://72.14.203.99/">72.14.203.99</a> <a href="http://72.14.203.104/">72.14.203.104</a> <a href="http://72.14.203.106/">72.14.203.106</a> <a href="http://72.14.203.107/">72.14.203.107</a> <a href="http://72.14.205.99/">72.14.205.99</a> <a href="http://72.14.205.104/">72.14.205.104</a> <a href="http://72.14.205.106/">72.14.205.106</a> <a href="http://72.14.205.107/">72.14.205.107</a> <a href="http://72.14.207.99/">72.14.207.99</a> <a href="http://72.14.207.104/">72.14.207.104</a> <a href="http://72.14.207.106/">72.14.207.106</a> <a href="http://72.14.207.107/">72.14.207.107</a> <a href="http://216.239.37.98/">216.239.37.98</a> <a href="http://216.239.37.99/">216.239.37.99</a> <a href="http://216.239.37.104/">216.239.37.104</a> <a href="http://216.239.37.105/">216.239.37.105</a> <a href="http://216.239.37.106/">216.239.37.106</a> <a href="http://216.239.37.107/">216.239.37.107</a> <a href="http://216.239.37.147/">216.239.37.147</a> <a href="http://216.239.39.98/">216.239.39.98</a> <a href="http://216.239.39.99/">216.239.39.99</a> <a href="http://216.239.39.104/">216.239.39.104</a> <a href="http://216.239.39.105/">216.239.39.105</a> <a href="http://216.239.39.106/">216.239.39.106</a> <a href="http://216.239.39.107/">216.239.39.107</a> <a href="http://216.239.53.98/">216.239.53.98</a> <a href="http://216.239.53.99/">216.239.53.99</a> <a href="http://216.239.53.104/">216.239.53.104</a> <a href="http://216.239.53.105/">216.239.53.105</a> <a href="http://216.239.53.106/">216.239.53.106</a> <a href="http://216.239.53.107/">216.239.53.107</a> <a href="http://216.239.57.98/">216.239.57.98</a> <a href="http://216.239.57.99/">216.239.57.99</a> <a href="http://216.239.57.103/">216.239.57.103</a> <a href="http://216.239.57.104/">216.239.57.104</a> <a href="http://216.239.57.105/">216.239.57.105</a> <a href="http://216.239.57.106/">216.239.57.106</a> <a href="http://216.239.57.107/">216.239.57.107</a> <a href="http://216.239.57.147/">216.239.57.147</a> <a href="http://216.239.59.98/">216.239.59.98</a> <a href="http://216.239.59.99/">216.239.59.99</a> <a href="http://216.239.59.103/">216.239.59.103</a> <a href="http://216.239.59.104/">216.239.59.104</a> <a href="http://216.239.59.105/">216.239.59.105</a> <a href="http://216.239.59.106/">216.239.59.106</a> <a href="http://216.239.59.107/">216.239.59.107</a> <a href="http://216.239.59.147/">216.239.59.147</a> <a href="http://216.239.63.99/">216.239.63.99</a> <a href="http://216.239.63.104/">216.239.63.104</a>
</p></blockquote>
<p>Here&#8217;s a quick dirty script that will check the number of each rank for a site on all datacenters:</p>
<p>[code lang="php"]<br />
    function singlethread_crawl($url)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $ch = curl_init();</p>
<p>        curl_setopt($ch, CURLOPT_NOSIGNAL, 1);<br />
        curl_setopt($ch, CURLOPT_NOPROGRESS, 1);<br />
        curl_setopt($ch, CURLOPT_FAILONERROR, 1);<br />
        curl_setopt($ch, CURLOPT_URL, $url);<br />
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);<br />
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);<br />
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);<br />
        curl_setopt($ch, CURLOPT_MAXREDIRS, 1);<br />
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);<br />
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);</p>
<p>        $html = curl_exec($ch);</p>
<p>        curl_close ($ch);</p>
<p>        return $html;<br />
    }</p>
<p>    function multithread_crawl($urls, $timeout)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $mh = curl_multi_init();</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            $conn[$i] = curl_init($url);<br />
            curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOSIGNAL, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOPROGRESS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_FAILONERROR, 1);<br />
            curl_setopt($conn[$i], CURLOPT_URL, $url);<br />
            curl_setopt($conn[$i], CURLOPT_USERAGENT, $agent);<br />
            curl_setopt($conn[$i], CURLOPT_SSL_VERIFYPEER, 0);<br />
            curl_setopt($conn[$i], CURLOPT_FOLLOWLOCATION, 1);<br />
            curl_setopt($conn[$i], CURLOPT_MAXREDIRS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_TIMEOUT, $timeout);</p>
<p>            curl_multi_add_handle ($mh, $conn[$i]);<br />
        }</p>
<p>        do<br />
        {<br />
            $mrc = curl_multi_exec($mh, $active);<br />
        }<br />
        while ($mrc == CURLM_CALL_MULTI_PERFORM);</p>
<p>        while ($active and $mrc == CURLM_OK)<br />
        {<br />
            if (curl_multi_select($mh) != -1)<br />
            {<br />
                do<br />
                {<br />
                    $mrc = curl_multi_exec($mh, $active);<br />
                }<br />
                while ($mrc == CURLM_CALL_MULTI_PERFORM);<br />
            }<br />
        }</p>
<p>        if ($mrc != CURLM_OK)<br />
        {<br />
            print "Curl multi read error $mrc\n";<br />
        }</p>
<p>        $res = array();<br />
        $e = 0;</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            if (($err = curl_error($conn[$i])) == '')<br />
            {<br />
            	$res[$i]=curl_multi_getcontent($conn[$i]);<br />
            }<br />
            else<br />
            {<br />
                echo "error: ".$url." (".$err.")\n";<br />
            }</p>
<p>            curl_multi_remove_handle($mh,$conn[$i]);<br />
            curl_close($conn[$i]);<br />
        }</p>
<p>        curl_multi_close($mh);</p>
<p>        return $res;<br />
    }</p>
<p>	function googleresults($ip, $search, $num){</p>
<p>		$url = "http://".$ip."/ie?q=".urlencode($search)."&#038;num=".$num;</p>
<p>		$html = singlethread_crawl($url);</p>
<p>		preg_match_all("/\" href=(.*)>/iU", $html, $links);</p>
<p>		$urlarray = str_replace("<b>","",$links[1]);<br />
		$urlarray = str_replace("</b>","",$urlarray);<br />
		$urlarray = preg_replace("/>.*$/i","", $urlarray);</p>
<p>		return $urlarray;<br />
	}</p>
<p>	function googleresults_multi($ips, $search, $num){</p>
<p>        for ($i=0; $i < count($ips); $i++)<br />
        {<br />
		    $urls[$i] = "http://".$ips[$i]."/ie?q=".urlencode($search)."&#038;num=".$num;<br />
        }</p>
<p>		$html = multithread_crawl($urls, count($ips)/2);</p>
<p>        for ($i=0; $i < count($ips); $i++)<br />
        {<br />
		    preg_match_all("/\" href=(.*)>/iU", $html[$i], $links[$i]);</p>
<p>		    $urlarray[$i] = str_replace("<b>","",$links[$i][1]);<br />
		    $urlarray[$i] = str_replace("</b>","",$urlarray[$i]);<br />
		    $urlarray[$i] = preg_replace("/>.*$/i","", $urlarray[$i]);<br />
        }</p>
<p>		return $urlarray;<br />
	}</p>
<p>	function removehttp($url){</p>
<p>		if ((substr($url,0,7) == 'http://') || (substr($url,0,8) == 'https://'))<br />
		{<br />
			$url = substr($url,7);<br />
		}</p>
<p>		$url = trim($url);</p>
<p>		return $url;<br />
	}</p>
<p>	function rank($arr, $item){</p>
<p>		for ($i = 0; $i < count($arr); $i++)<br />
		{<br />
			if (eregi($item,$arr[$i]))<br />
			{<br />
				$result = $i+1;</p>
<p>				break;<br />
			}<br />
		}</p>
<p>		return($result);<br />
	}</p>
<p>    $ips = array("64.233.161.99", "64.233.161.104", "64.233.161.105", "64.233.161.106", "64.233.161.107", "64.233.161.147", "64.233.163.99", "64.233.163.104", "64.233.167.99", "64.233.167.104", "64.233.167.106", "64.233.167.107", "64.233.167.147", "64.233.171.99", "64.233.171.104", "64.233.171.105", "64.233.171.106", "64.233.171.107", "64.233.171.147", "64.233.179.99", "64.233.179.104", "64.233.179.106", "64.233.179.107", "64.233.183.99", "64.233.183.104", "64.233.183.107", "64.233.185.99", "64.233.185.104", "64.233.185.106", "64.233.185.107", "64.233.187.99", "64.233.187.104", "64.233.187.106", "64.233.187.107", "64.233.189.104", "66.102.7.99", "66.102.7.104", "66.102.7.105", "66.102.7.106", "66.102.7.107", "66.102.7.147", "66.102.9.99", "66.102.9.104", "66.102.9.106", "66.102.9.107", "66.102.9.147", "66.102.11.99", "66.102.11.104", "66.102.11.106", "66.102.11.107", "66.249.81.99", "66.249.81.104", "66.249.81.106", "66.249.81.107", "66.249.83.99", "66.249.83.104", "66.249.83.106", "66.249.83.107", "66.249.85.99", "66.249.85.104", "66.249.85.106", "66.249.85.107", "66.249.87.99", "66.249.87.104", "66.249.89.99", "66.249.89.104", "66.249.89.106", "66.249.89.107", "66.249.93.99", "66.249.93.104", "66.249.93.106", "66.249.93.107", "72.14.203.99", "72.14.203.104", "72.14.203.106", "72.14.203.107", "72.14.205.99", "72.14.205.104", "72.14.205.106", "72.14.205.107", "72.14.207.99", "72.14.207.104", "72.14.207.106", "72.14.207.107", "216.239.37.99", "216.239.37.104", "216.239.37.105", "216.239.37.106", "216.239.37.107", "216.239.37.147", "216.239.39.99", "216.239.39.104", "216.239.39.106", "216.239.39.107", "216.239.53.99", "216.239.53.104", "216.239.53.106", "216.239.53.107", "216.239.57.98", "216.239.57.99", "216.239.57.103", "216.239.57.104", "216.239.57.105", "216.239.57.106", "216.239.57.107", "216.239.57.147", "216.239.59.98", "216.239.59.99", "216.239.59.103", "216.239.59.104", "216.239.59.105", "216.239.59.106", "216.239.59.107", "216.239.59.147", "216.239.63.99", "216.239.63.104");</p>
<p>    $query = array("porn", "pills", "casino");</p>
<p>    $num = "100";</p>
<p>    for ($i=0; $i < count($query); $i++)<br />
    {<br />
        echo $query[$i]."\n";</p>
<p>        $numrank = array();</p>
<p>//        $serps = googleresults_multi($ips, $query[$i], $num);	  //for multithreaded</p>
<p>        for ($j=0; $j < count($ips); $j++)<br />
        {<br />
            $serps[$j] = googleresults($ips[$j], $query[$i], $num);	  //for single threaded</p>
<p>            $pos = rank($serps[$j], "www.mattcutts.com");</p>
<p>            if (empty($pos)){$pos = 0;}</p>
<p>            if (empty($numrank[$pos]))<br />
            {<br />
                $numrank[$pos] = 1;<br />
            }else{<br />
                $numrank[$pos] = $numrank[$pos]+1;<br />
            }<br />
        }</p>
<p>        ksort($numrank);</p>
<p>        foreach ($numrank as $key => $num)<br />
        {<br />
            echo $key." - ".$num."\n";<br />
        }<br />
    }</p>
<p>[/code]</p>
<p>Single threaded (default) is very very slow. If you change the comments as shown above to multithreaded, then it will be much faster. But be prepared to get your ip temporarily banned by Google.</p>
<p>[tags]Google datacenters, bigdaddy[/tags]</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/google-datacenters.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ColorCode &#8211; WordPress Plugin to Highlight Code</title>
		<link>http://www.nott.org/blog/colorcode-wordpress-plugin-to-highlight-code.html</link>
		<comments>http://www.nott.org/blog/colorcode-wordpress-plugin-to-highlight-code.html#comments</comments>
		<pubDate>Mon, 02 Jan 2006 23:36:53 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=47</guid>
		<description><![CDATA[Spent a bit of time yesterday trying to get my code examples highlighted using the GeSHi class. However using the WordPress plugin code, I found that though the code was coming up fine it was removing all paragraphs and line breaks from any surrounding text. So I then wrote my own plugin still calling the [...]]]></description>
			<content:encoded><![CDATA[<p>Spent a bit of time yesterday trying to get my code examples highlighted using the <a href="http://qbnz.com/highlighter/">GeSHi</a> class. However using the <a href="http://dev.wp-plugins.org/wiki/GeshiSyntaxColorer">WordPress plugin</a> code, I found that though the code was coming up fine it was removing all paragraphs and line breaks from any surrounding text. So I then wrote my own plugin still calling the GeSHi class file, but where it displays both code and text properly.</p>
<p>It is still not perfect though, as it would be better to use pre for the code instead of all the spaces etc. Also need to switch to css instead of styles before <a href="http://www.w3.org/TR/xhtml2/">XHTML 2.0</a> arrives,  so will post an update once done.</p>
<p>Here is the plugin code:</p>
<p>[code lang="php"]</p>
<p>	/*<br />
	Plugin Name: ColorCode<br />
	Plugin URI: http://www.nott.org/colorcode.html<br />
	Description: A filter that highlights code using the GeSHi class for over 20 languages.<br />
	Version: 1.0<br />
	Author: Mike Nott<br />
	Author URI: http://www.nott.org<br />
	*/</p>
<p>	include(ABSPATH.'/wp-content/plugins/geshi.php');</p>
<p>	function cc_callback($code)<br />
	{<br />
		$geshi = new GeSHi($code[2], $code[1], ABSPATH.'/wp-content/plugins/geshi/');</p>
<p>		$geshi->set_header_type(GESHI_HEADER_DIV);</p>
<p>		$geshi->set_url_for_keyword_group(3, '');</p>
<p>		$newcode = $geshi->parse_code();</p>
<p>		return $newcode;<br />
	}</p>
<p>	function colorcode($content)<br />
	{<br />
		return preg_replace_callback("|<code lang=['\"]([a-zA-Z0-9_-]+)['\"]>(.*)< /code>|imsU", "cc_callback", $content);<br />
	}</p>
<p>	remove_filter('the_content', 'wptexturize');</p>
<p>	add_filter('the_content', 'colorcode', '1');<br />
	add_filter('the_excerpt', 'colorcode', '1');<br />
	add_filter('comment_text', 'colorcode', '1');</p>
<p>[/code]</p>
<p>[note: be sure to remove the space before the /code in the preg replace above before using]</p>
<p>Then just save this file as colorcode.php in your plugin folder, along with the <a href="http://dev.wp-plugins.org/wiki/GeshiSyntaxColorer">GeSHi files</a>.</p>
<p>Usage: </p>
<p>< code lang = " php "><br />
code goes here<br />
< / code ></p>
<p>(but again remove spaces) <img src='http://www.nott.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/colorcode-wordpress-plugin-to-highlight-code.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP Crawler</title>
		<link>http://www.nott.org/blog/php-crawler.html</link>
		<comments>http://www.nott.org/blog/php-crawler.html#comments</comments>
		<pubDate>Sat, 31 Dec 2005 16:39:20 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=46</guid>
		<description><![CDATA[For crawling in PHP I have always used the fantastic cURL. My curl single-threaded function: [code lang="php"] function singlethread_crawl($url) { $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"; $ch = curl_init(); curl_setopt($ch, CURLOPT_NOSIGNAL, 1); curl_setopt($ch, CURLOPT_NOPROGRESS, 1); curl_setopt($ch, CURLOPT_FAILONERROR, 1); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_USERAGENT, $agent); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_MAXREDIRS, [...]]]></description>
			<content:encoded><![CDATA[<p>For crawling in <a href="http://www.php.net">PHP</a> I have always used the fantastic <a href="http://www.php.net/manual/en/ref.curl.php">cURL</a>.</p>
<p>My curl single-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function singlethread_crawl($url)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $ch = curl_init();</p>
<p>        curl_setopt($ch, CURLOPT_NOSIGNAL, 1);<br />
        curl_setopt($ch, CURLOPT_NOPROGRESS, 1);<br />
        curl_setopt($ch, CURLOPT_FAILONERROR, 1);<br />
        curl_setopt($ch, CURLOPT_URL, $url);<br />
        curl_setopt($ch, CURLOPT_USERAGENT, $agent);<br />
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);<br />
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);<br />
        curl_setopt($ch, CURLOPT_MAXREDIRS, 1);<br />
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);<br />
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);</p>
<p>        $html = curl_exec($ch);</p>
<p>        curl_close ($ch);</p>
<p>        return $html;<br />
    }</p>
<p>[/code]</p>
<p>My curl multi-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function multithread_crawl($urls, $timeout, $verbose)<br />
    {<br />
        $agent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)";</p>
<p>        $mh = curl_multi_init();</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            $conn[$i] = curl_init($url);<br />
            curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOSIGNAL, 1);<br />
            curl_setopt($conn[$i], CURLOPT_NOPROGRESS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_FAILONERROR, 1);<br />
            curl_setopt($conn[$i], CURLOPT_URL, $url);<br />
            curl_setopt($conn[$i], CURLOPT_USERAGENT, $agent);<br />
            curl_setopt($conn[$i], CURLOPT_SSL_VERIFYPEER, 0);<br />
            curl_setopt($conn[$i], CURLOPT_FOLLOWLOCATION, 1);<br />
            curl_setopt($conn[$i], CURLOPT_MAXREDIRS, 1);<br />
            curl_setopt($conn[$i], CURLOPT_TIMEOUT, $timeout);</p>
<p>            curl_multi_add_handle ($mh, $conn[$i]);<br />
        }</p>
<p>        do<br />
        {<br />
            $mrc = curl_multi_exec($mh, $active);<br />
        }<br />
        while ($mrc == CURLM_CALL_MULTI_PERFORM);</p>
<p>        while ($active and $mrc == CURLM_OK)<br />
        {<br />
            if (curl_multi_select($mh) != -1)<br />
            {<br />
                do<br />
                {<br />
                    $mrc = curl_multi_exec($mh, $active);<br />
                }<br />
                while ($mrc == CURLM_CALL_MULTI_PERFORM);<br />
            }<br />
        }</p>
<p>        if ($mrc != CURLM_OK)<br />
        {<br />
            print "Curl multi read error $mrc\n";<br />
        }</p>
<p>        $res = array();<br />
        $e = 0;</p>
<p>        foreach ($urls as $i => $url)<br />
        {<br />
            if (($err = curl_error($conn[$i])) == '')<br />
            {<br />
            	$res[$i]=curl_multi_getcontent($conn[$i]);<br />
            }<br />
            else<br />
            {<br />
                if ($verbose == "yes"){<br />
                    echo "error: ".$url." (".$err.")\n";<br />
                }else{<br />
                    $e++;<br />
                }<br />
            }</p>
<p>            curl_multi_remove_handle($mh,$conn[$i]);<br />
            curl_close($conn[$i]);<br />
        }</p>
<p>        curl_multi_close($mh);</p>
<p>        $s = count($urls)-$e;</p>
<p>        if ($verbose == "no"){<br />
            echo "errors ".$e." | success ".$s."\n";<br />
        }</p>
<p>        return $res;<br />
    }</p>
<p>[/code]</p>
<p>However there are some annoyances in curl &#8211; the main one for me being that you can&#8217;t pass variables to the write_function, </p>
<p>[code lang="php"]<br />
curl_setopt($conn[$i], CURLOPT_WRITEFUNCTION, myfunction);<br />
[/code]</p>
<p>which makes it useless for updating rows etc in a db (you can use <a href="http://www.php.net/curl_getinfo">curl_getinfo</a> to get the url so do a lookup &#8211; but that is pretty backwards). This means that the crawling is not even close to being truely multithreaded as you have to wait for all urls to finish before working with the data.</p>
<p>So I thought I&#8217;d have a go at writing the raw crawler myself using <a href="http://www.php.net/fsockopen">fsockopen</a>. Is not perfect as the multithread function does require the single thread one to follow any redirects.</p>
<p>My own single-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function mycrawler_single($url, $timeout=10, $maxredirs=1)<br />
    {<br />
        $urlinfo = parse_url($url);</p>
<p>        if (empty($urlinfo['scheme'])) {$urlinfo = parse_url('http://'.$url);}<br />
        if (empty($urlinfo["path"])) {$urlinfo["path"]="/";}</p>
<p>        if (empty($urlinfo['port']))<br />
        {<br />
			switch($urlinfo['scheme'])<br />
			{<br />
				case "http":<br />
					$urlinfo['port'] = 80;<br />
                    break;<br />
				case "https":<br />
					$urlinfo['port'] = 443;<br />
                    break;<br />
			}<br />
        }</p>
<p>        if (isset($urlinfo["query"]))<br />
        {<br />
            $request = "GET ".$urlinfo["path"]."?".$urlinfo["query"]." ";<br />
        } else {<br />
            $request = "GET ".$urlinfo["path"]." ";<br />
        }</p>
<p>        $request .= "HTTP/1.0\r\n";<br />
        $request .= "Host: ".$urlinfo['host']."\r\n";<br />
        $request .= "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\r\n";<br />
        $request .= "Connection: close\r\n\r\n";</p>
<p>        $fp = fsockopen($urlinfo['host'], $urlinfo['port'], $errno, $errstr, $timeout);</p>
<p>        if (!$fp)<br />
		{<br />
			echo "(".$errno.")".$errstr."\n";<br />
		}<br />
		else<br />
		{<br />
            fwrite($fp, $request);</p>
<p>            while (!feof($fp))<br />
            {<br />
                $data .= fgets($fp, 4096);<br />
            }</p>
<p>            fclose($fp);   </p>
<p>            $tmp = explode("\r\n\r\n", $data, 2);</p>
<p>            $urlinfo['header'] = $tmp[0];<br />
            $urlinfo['html'] = $tmp[1]; </p>
<p>            if ((stripos($urlinfo['header'], "location:")) &#038;&#038; ($maxredirs > 0))<br />
            {<br />
                preg_match("/\r\nlocation:(.*)/i", $urlinfo['header'], $match);</p>
<p>                if ($match)<br />
                {<br />
                    $redirect = trim($match[1]);</p>
<p>                    echo "Redirecting to ".$redirect."\n";</p>
<p>                    $maxredirs--;                         </p>
<p>                    return mycrawler_single($redirect, $timeout, $maxredirs);<br />
                }<br />
            }        </p>
<p>            return $urlinfo;<br />
		}<br />
    }</p>
<p>[/code]</p>
<p>My own multi-threaded function:</p>
<p>[code lang="php"]</p>
<p>    function mycrawler_multi($urls, $timeout=10, $maxredirects=1)<br />
    {</p>
<p>        for ($i=0; $i<count($urls); $i++)<br />
        {<br />
            $urlinfo[$i] = parse_url($urls[$i]);<br />
            $maxredirs[$i] = $maxredirects;</p>
<p>            if (empty($urlinfo[$i]['scheme'])) {$urlinfo[$i] = parse_url('http://'.$url);}<br />
            if (empty($urlinfo[$i]["path"])) {$urlinfo[$i]["path"]="/";}</p>
<p>            if (empty($urlinfo[$i]['port']))<br />
            {<br />
			    switch($urlinfo[$i]['scheme'])<br />
			    {<br />
				    case "http":<br />
					    $urlinfo[$i]['port'] = 80;<br />
                        break;<br />
				    case "https":<br />
					    $urlinfo[$i]['port'] = 443;<br />
                        break;<br />
			    }<br />
            }</p>
<p>            if (isset($urlinfo[$i]["query"]))<br />
            {<br />
                $request[$i] = "GET ".$urlinfo[$i]["path"]."?".$urlinfo[$i]["query"]." ";<br />
            } else {<br />
                $request[$i] = "GET ".$urlinfo[$i]["path"]." ";<br />
            }</p>
<p>            $request[$i] .= "HTTP/1.0\r\n";<br />
            $request[$i] .= "Host: ".$urlinfo[$i]['host']."\r\n";<br />
            $request[$i] .= "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\r\n";<br />
            $request[$i] .= "Connection: close\r\n\r\n";</p>
<p>            $fp[$i] = fsockopen($urlinfo[$i]['host'], $urlinfo[$i]['port'], $urlinfo[$i]['errno'], $urlinfo[$i]['errstr'], $timeout);</p>
<p>            socket_set_blocking($fp[$i], false);</p>
<p>            if (!$fp[$i])<br />
		    {<br />
			    echo "(".$urlinfo[$i]['errno'].")".$urlinfo[$i]['errstr']."\n";<br />
		    }<br />
		    else<br />
		    {<br />
                fwrite($fp[$i], $request[$i]);<br />
            }<br />
        }</p>
<p>        $done = false;<br />
        $numdone = array();</p>
<p>        while (!$done)<br />
        {<br />
            for ($i=0; $i<count($urls); $i++)<br />
            {<br />
                if (!feof($fp[$i]))<br />
                {<br />
                    $data[$i] .= fgets($fp[$i], 4096);<br />
                }<br />
                elseif (empty($numdone[$i]))<br />
                {<br />
                    $numdone[$i] = 1;</p>
<p>                    $tmp[$i] = explode("\r\n\r\n", $data[$i], 2);</p>
<p>                    $urlinfo[$i]['header'] = $tmp[$i][0];<br />
                    $urlinfo[$i]['html'] = $tmp[$i][1]; </p>
<p>                    if ((stripos($urlinfo[$i]['header'], "location:")) &#038;&#038; ($maxredirs[$i] > 0))<br />
                    {<br />
                        preg_match("/\r\nlocation:(.*)/i", $urlinfo[$i]['header'], $match[$i]);</p>
<p>                        if ($match[$i])<br />
                        {<br />
                            $redirect[$i] = trim($match[$i][1]);</p>
<p>                            echo "Redirecting to ".$redirect[$i]."\n";</p>
<p>                            $maxredirs[$i]--;                         </p>
<p>                            $urlinfo[$i] = mycrawler_single($redirect[$i], $timeout, $maxredirs[$i]);<br />
                        }<br />
                    }<br />
                }<br />
            }</p>
<p>            $done = (array_sum($numdone) == count($urls));<br />
        }       </p>
<p>        for ($i=0; $i<count($urls); $i++)<br />
        {<br />
            fclose($fp[$i]);<br />
        }</p>
<p>        return $urlinfo;<br />
    }</p>
<p>[/code]</p>
<p>All require PHP5.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/php-crawler.html/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Google 2 RSS</title>
		<link>http://www.nott.org/blog/google-2-rss.html</link>
		<comments>http://www.nott.org/blog/google-2-rss.html#comments</comments>
		<pubDate>Thu, 29 Dec 2005 16:06:11 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=44</guid>
		<description><![CDATA[On ThreadWatch yesterday there was a thread about rank checkers, and I couldn&#8217;t believe that some SEOs don&#8217;t use them. We use our own heavy duty mega serp scraper to fully analyse any industry we are working in. Anyway, Graywolf mentioned how he would love a Google RSS or XML feed &#8211; I having been [...]]]></description>
			<content:encoded><![CDATA[<p>On <a href="http://www.threadwatch.org/">ThreadWatch</a> yesterday there was a <a href="http://www.threadwatch.org/node/5140">thread</a> about rank checkers, and I couldn&#8217;t believe that some SEOs don&#8217;t use them. We use our own heavy duty mega serp scraper to fully analyse any industry we are working in. Anyway, <a href="http://www.wolf-howl.com/">Graywolf</a> mentioned how he would love a Google RSS or XML feed &#8211; I having been waiting for this for a long time, as their SERPs are so dirty it would make things a bit easier. And to only offer 10 results per page in their <a href="http://www.google.com/apis/">API</a> is shocking!! Come on Goo, catch up with MSN + Yahoo.</p>
<p>Anyway, I got a bit bored today and knocked up a quick Google2RSS php script for those who are without (being xmas season)</p>
<p>Warning &#8211; this is very quick, dirty +  crude code (in other words &#8211; not the best) <img src='http://www.nott.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>[requires <a href="http://www.php.net/">PHP5</a> + <a href="http://www.php.net/curl">cURL</a>]</p>
<p>[code lang="php"]</p>
<p>	header("Content-type: text/xml\n");</p>
<p>	echo google2rss("spam", 10);</p>
<p>	function google2rss($query, $numres)<br />
	{<br />
        $ch = curl_init();</p>
<p>        curl_setopt($ch, CURLOPT_URL, "http://www.google.com/search?q=".$query."&#038;num=".$numres."&#038;hl=en&#038;safe=off");<br />
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");<br />
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);</p>
<p>        $html = curl_exec($ch);</p>
<p>        curl_close ($ch);</p>
<p>		$html = str_replace("\r\n", " ", $html);<br />
		$html = str_replace("
<p class=g>", "\n
<p class=g>", $html);<br />
		$html = str_replace("</div>
<p>", "</p></div>
<p>\n", $html);<br />
		$html = str_replace("View as HTML</a>", "", $html);</p>
<p>		preg_match_all("/
<p class=g>(.*)<a class=l href=\"(.*)\" (.*)\">(.*)<\/a>(.*)<br /><font/i", $html, $matches);</p>
<p>		$items['url'] = $matches[2];</p>
<p>		for ($i=0; $i < count($items['url']); $i++)<br />
		{<br />
			$items['title'][$i] = strip_tags($matches[4][$i]);<br />
			$items['title'][$i] = str_replace(" - [ Translate this page", "", $items['title'][$i]);<br />
			$items['desc'][$i] = strip_tags($matches[5][$i]);<br />
			$items['desc'][$i] = preg_replace("/^ ]/i", "", $items['desc'][$i]);<br />
		}</p>
<p>		$rss = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";<br />
		$rss .= "<rss version=\"2.0\">\n";<br />
		$rss .= "<channel>\n";<br />
		$rss .= "\t\n";<br />
		$rss .= "\t
<link>http://www.google.com/search?q=".$query."&amp;num=".$numres."&amp;hl=en&amp;safe=off</link>\n";<br />
		$rss .= "\t<description>".$query." - Google RSS search results</description>\n";<br />
		$rss .= "\t
<pubDate>".date(DATE_RFC822)."</pubDate>\n";<br />
		$rss .= "\t<generator>Mike Nott - http://www.nott.org</generator>\n";<br />
		$rss .= "\t<language>en</language>\n";</p>
<p>		for ($i=0; $i < count($items['url']); $i++)<br />
		{<br />
			$rss .= "\t<item>\n";<br />
			$rss .= "\t\t\n";<br />
			$rss .= "\t\t
<link>".$items['url'][$i]."</link>\n";<br />
			$rss .= "\t\t<description>".htmlspecialchars($items['desc'][$i])."</description>\n";<br />
			$rss .= "\t\t
<pubDate>".date(DATE_RFC822)."</pubDate>\n";<br />
			$rss .= "\t</item>\n";<br />
		}</p>
<p>		$rss .= "</channel>\n";<br />
		$rss .= "</rss>";</p>
<p>		return $rss;<br />
	}</p>
<p>[/code]</p>
<p>If anyone who is actually good at RegEx would like to improve the code, please do. <img src='http://www.nott.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/google-2-rss.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Easy Google RSS Scraping</title>
		<link>http://www.nott.org/blog/easy-google-rss-scraping.html</link>
		<comments>http://www.nott.org/blog/easy-google-rss-scraping.html#comments</comments>
		<pubDate>Sun, 04 Dec 2005 11:49:09 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=27</guid>
		<description><![CDATA[So, for all those of you that use GYM to scrape for URLs to ::ahem:: study, and are forced to use their normal messy SERPs because their API only allows 10 results per query (or even those that use /ie or /palm). I just noticed that on their Blog Search, Google allow you to grab [...]]]></description>
			<content:encoded><![CDATA[<p>So, for all those of you that use <a href="http://www.google.com">G</a><a href="http://search.yahoo.com">Y</a><a href="http://search.msn.com">M</a> to scrape for URLs to ::ahem:: study, and are forced to use their normal messy SERPs because their <a href="http://www.google.com/apis/">API</a> only allows 10 results per query (or even those that use <a href="http://www.google.com/ie">/ie</a> or <a href="http://www.google.com/palm">/palm</a>). </p>
<p>I just noticed that on their <a href="http://blogsearch.google.com/">Blog Search</a>, Google allow you to grab an <a href="http://blogsearch.google.com/blogsearch_feeds?q=stealing+content&#038;num=100&#038;output=rss">RSS feed of the results</a>.</p>
<p>So now a simple little:</p>
<p>[code lang="php"]<br />
preg_match_all('/
<link>(.*)<\/link>/i', $url, $arrayoflinks);<br />
[/code]</p>
<p>will get a whole load of URLs very simply.</p>
<p>Blogs also tend to have more content than web pages, which is even better for ::ahem:: studying.  <img src='http://www.nott.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>But of course, it would help is Google was to have an XML feed of their normal SERPS like <a href="http://api.search.yahoo.com/WebSearchService/V1/webSearch?appid=YahooDemo&#038;query=stealing+content">Yahoo</a> &#038; <a href="http://search.msn.com/results.aspx?q=stealing+content&#038;count=100&#038;format=rss">MSN</a> do.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/easy-google-rss-scraping.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WordPress Install</title>
		<link>http://www.nott.org/blog/wordpress-install.html</link>
		<comments>http://www.nott.org/blog/wordpress-install.html#comments</comments>
		<pubDate>Thu, 24 Nov 2005 17:28:32 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.nott.org/?p=15</guid>
		<description><![CDATA[I decided to use WordPress for this blog as it seems to be the current &#8216;standard&#8217;. It was nice to se that it works nicely out of the box after the short install. However, being as awkward as I am, I decided that that I wanted to tweak it a bit: Firstly I didn&#8217;t want [...]]]></description>
			<content:encoded><![CDATA[<p>I decided to use <a href="http://wordpress.org">WordPress</a> for this blog as it seems to be the current &#8216;standard&#8217;. It was nice to se that it works nicely out of the box after the short install. However, being as awkward as I am, I decided that that I wanted to tweak it a bit:</p>
<p>Firstly I didn&#8217;t want to have a dynamic url system, so opted to use WP&#8217;s &#8216;permalinks&#8217; setup. But again I didn&#8217;t want to use their standard folder architecture that 1,000s of others also use e.g. blog.com/archive/2005/11/post-title.html. So I decided to use the following structure:</p>
<ul>
<li>www.nott.org
<ul>
<li>/home-page.html</li>
<li>/content-page-title.html</li>
<li>/blog/</li>
<li>/blog/post-title.html</li>
<li>/blog/category/</li>
<li>/blog/2005/11/     (archive dates)</li>
<li>/blog/feed/feed-format.xml</li>
<li>/blog/feed/category/feed-format.xml  (category feed)</li>
<li>/blog/feed/post-title.xml  (comments feed)</li>
</ul>
</li>
</ul>
<p>To do this took a few mod_rewrite lines in Apache:</p>
<p>[code lang="apache"]<br />
Options +FollowSymLinks<br />
RewriteEngine On<br />
RewriteRule ^/blog/$ / [R=301]<br />
RewriteCond %{REQUEST_URI} !^/index\.html$<br />
RewriteCond %{REQUEST_URI} !^/index\.php$<br />
RewriteRule ^/([A-Za-z0-9-_]+).html$ /index.php?pagename=$1 [QSA]<br />
RewriteRule ^/blog/([A-Za-z0-9-_]+).html$ /index.php?name=$1 [QSA]<br />
RewriteRule ^/blog/([0-9]+)/([0-9]+)/$ /index.php?m=$1$2 [QSA]<br />
RewriteRule ^/blog/feed/(rdf|rss|rss2|atom).xml$ /wp-feed.php?feed=$1 [QSA]<br />
RewriteRule ^/blog/feed/comments/([A-Za-z0-9-_]+).xml$ /wp-feed.php?feed=rss2&#038;name=$1 [QSA]<br />
RewriteRule ^/blog/([A-Za-z0-9-_]+)/$ /index.php?category_name=$1 [QSA]<br />
RewriteRule ^/blog/feed/([A-Za-z0-9-_]+)/(rdf|rss|rss2|atom).xml$ /wp-feed.php?category_name=$1&#038;feed=$2 [QSA]<br />
[/code]</p>
<p>Then some small changes to some of the wordpress php functions (i know this will make upgrades tricky, but if I know what changes I&#8217;ve made, it should be possible):</p>
<p><strong>/wp-includes/feed-functions.php</strong></p>
<p>[code lang="php"]<br />
function comments_rss($commentsrssfilename = '') {<br />
	global $id;</p>
<p>	if ('' != get_settings('permalink_structure'))<br />
		$url = str_replace(".html",".xml",str_replace("/blog/", "/blog/feed/comments/", get_permalink()));<br />
	else<br />
		$url = get_settings('home') . "/$commentsrssfilename?feed=rss2&amp;p=$id";</p>
<p>	return apply_filters('post_comments_feed_link', $url);<br />
}<br />
[/code]</p>
<p><strong>/wp-includes/template-functions-links.php</strong></p>
<p>[code lang="php"]<br />
function get_feed_link($feed='rss2') {<br />
	global $wp_rewrite;<br />
	$do_perma = 0;<br />
	$feed_url = get_settings('siteurl');<br />
	$comment_feed_url = $feed_url;</p>
<p>	$permalink = $wp_rewrite->get_feed_permastruct();<br />
	if ('' != $permalink) {<br />
		if ( false !== strpos($feed, 'comments_') ) {<br />
			$feed = str_replace('comments_', '', $feed);<br />
			$permalink = $wp_rewrite->get_comment_feed_permastruct();<br />
		}</p>
<p>		$permalink = str_replace('%feed%', $feed, $permalink);<br />
		$permalink = preg_replace('#/+#', '/', "/$permalink/");<br />
		$output = get_settings('home') . "/blog/feed/" . $feed . ".xml";<br />
	} else {<br />
		if ( false !== strpos($feed, 'comments_') )<br />
			$feed = str_replace('comments_', 'comments-', $feed);</p>
<p>		$output = get_settings('home') . "/?feed={$feed}";<br />
	}</p>
<p>	return apply_filters('feed_link', $output, $feed);<br />
}<br />
[/code]</p>
<p><strong>template-functions-post.php</strong></p>
<p>[code lang="php"]<br />
function wp_list_pages($args = '') {<br />
	parse_str($args, $r);<br />
	if ( !isset($r['depth']) ) $r['depth'] = 0;<br />
	if ( !isset($r['show_date']) ) $r['show_date'] = '';<br />
	if ( !isset($r['child_of']) ) $r['child_of'] = 0;<br />
	if ( !isset($r['title_li']) ) $r['title_li'] = __('Pages');<br />
	if ( !isset($r['echo']) ) $r['echo'] = 1;</p>
<p>	$output = '';</p>
<p>	// Query pages.<br />
	$pages = &#038; get_pages($args);<br />
	if ( $pages ) :</p>
<p>	if ( $r['title_li'] )<br />
		$output .= '
<li class="pagenav">' . $r['title_li'] . '
<ul>';<br />
	// Now loop over all pages that were selected<br />
	$page_tree = Array();<br />
	foreach($pages as $page) {<br />
		// set the title for the current page<br />
		$page_tree[$page->ID]['title'] = $page->post_title;<br />
		$page_tree[$page->ID]['name'] = $page->post_name;</p>
<p>		// set the selected date for the current page<br />
		// depending on the query arguments this is either<br />
		// the createtion date or the modification date<br />
		// as a unix timestamp. It will also always be in the<br />
		// ts field.<br />
		if (! empty($r['show_date'])) {<br />
			if ('modified' == $r['show_date'])<br />
				$page_tree[$page->ID]['ts'] = $page->post_modified;<br />
			else<br />
				$page_tree[$page->ID]['ts'] = $page->post_date;<br />
		}</p>
<p>		// The tricky bit!!<br />
		// Using the parent ID of the current page as the<br />
		// array index we set the curent page as a child of that page.<br />
		// We can now start looping over the $page_tree array<br />
		// with any ID which will output the page links from that ID downwards.<br />
		if ( $page->post_parent != $page->ID)<br />
			$page_tree[$page->post_parent]['children'][] = $page->ID;<br />
	}<br />
	// Output of the pages starting with child_of as the root ID.<br />
	// child_of defaults to 0 if not supplied in the query.<br />
	$output .= _page_level_out($r['child_of'],$page_tree, $r, 0, false);<br />
	if ( $r['title_li'] )<br />
		$output .= '</ul>
</li>
<p>';<br />
	endif;</p>
<p>	$output = apply_filters('wp_list_pages', $output);</p>
<p>	if ( $r['echo'] )<br />
		echo str_replace('/"','.html"',$output);<br />
	else<br />
		return $output;<br />
}<br />
[/code]</p>
<p>and</p>
<p>[code lang="php"]<br />
function _page_level_out($parent, $page_tree, $args, $depth = 0, $echo = true) {<br />
//	global $wp_query;</p>
<p>//	$queried_obj = $wp_query->get_queried_object();</p>
<p>	$output = '';</p>
<p>	if($depth)<br />
		$indent = str_repeat("\t", $depth);<br />
	//$indent = join('', array_fill(0,$depth,"\t"));</p>
<p>	foreach($page_tree[$parent]['children'] as $page_id) {<br />
		$cur_page = $page_tree[$page_id];<br />
		$title = $cur_page['title'];</p>
<p>		$css_class = 'page_item';<br />
		if( $page_id == $queried_obj->ID) {<br />
			$css_class .= ' current_page_item';<br />
		}</p>
<p>		$output .= $indent . '
<li class="' . $css_class . '"><a href="' . get_page_link($page_id) . '" title="' . wp_specialchars($title) . '">' . $title . '</a>';</p>
<p>		if(isset($cur_page['ts'])) {<br />
			$format = get_settings('date_format');<br />
			if(isset($args['date_format']))<br />
				$format = $args['date_format'];<br />
			$output .= " " . mysql2date($format, $cur_page['ts']);<br />
		}<br />
		echo "\n";</p>
<p>		if(isset($cur_page['children']) &#038;&#038; is_array($cur_page['children'])) {<br />
			$new_depth = $depth + 1;</p>
<p>			if(!$args['depth'] || $depth < ($args['depth']-1)) {<br />
				$output .= "$indent
<ul>\n";<br />
				$output .= _page_level_out($page_id, $page_tree, $args, $new_depth, false);<br />
				$output .= "$indent\n";<br />
			}<br />
		}<br />
		$output .= "$indent</li>
<p>\n";<br />
	}<br />
	if ( $echo )<br />
		echo $output;<br />
	else<br />
		return $output;<br />
}<br />
[/code]</p>
<p><strong>template-functions-general.php</strong> (to put blog description in homepage title)</p>
<p>[code lang="php"]<br />
function wp_title($sep = '&raquo;', $display = true) {<br />
    global $wpdb;<br />
    global $m, $year, $monthnum, $day, $category_name, $month, $posts;</p>
<p>		$cat = get_query_var('cat');<br />
		$p = get_query_var('p');<br />
		$name = get_query_var('name');<br />
		$category_name = get_query_var('category_name');</p>
<p>    // If there's a category<br />
    if(!empty($cat)) {<br />
        if (!stristr($cat,'-')) { // category excluded<br />
            $title = get_the_category_by_ID($cat);<br />
        }<br />
    }<br />
    if (!empty($category_name)) {<br />
        if (stristr($category_name,'/')) {<br />
            $category_name = explode('/',$category_name);<br />
            if ($category_name[count($category_name)-1]) {<br />
                $category_name = $category_name[count($category_name)-1]; // no trailing slash<br />
            } else {<br />
                $category_name = $category_name[count($category_name)-2]; // there was a trailling slash<br />
            }<br />
        }<br />
        $title = $wpdb->get_var("SELECT cat_name FROM $wpdb->categories WHERE category_nicename = '$category_name'");<br />
    }</p>
<p>    // If there's a month<br />
    if(!empty($m)) {<br />
        $my_year = substr($m, 0, 4);<br />
        $my_month = $month[substr($m, 4, 2)];<br />
        $title = "$my_year $sep $my_month";</p>
<p>    }<br />
    if (!empty($year)) {<br />
        $title = $year;<br />
        if (!empty($monthnum)) {<br />
            $title .= " $sep ".$month[zeroise($monthnum, 2)];<br />
        }<br />
        if (!empty($day)) {<br />
            $title .= " $sep ".zeroise($day, 2);<br />
        }<br />
    }</p>
<p>    // If there's a post<br />
    if (is_single() || is_page()) {<br />
        $title = strip_tags($posts[0]->post_title);<br />
        $title = apply_filters('single_post_title', $title);<br />
    }</p>
<p>    // Send it out<br />
    if ($display &#038;&#038; isset($title)) {<br />
        echo " $sep $title";<br />
    } elseif (!$display &#038;&#038; isset($title)) {<br />
        return " $sep $title";<br />
    } elseif ($display){<br />
    	echo " $sep ";<br />
    	bloginfo('description');<br />
    }<br />
}<br />
[/code]</p>
<p>I still haven&#8217;t gotten round to putting correct titles in the feeds, will post the fuctoin changes when done.</p>
<p>The excellent template I used as a base is <a href="http://www.perun.net/">Red Train 1.0</a> by <a href="http://www.vlad-design.de/">Vladimir Simovic</a> (so a big thanks to Vlad). </p>
<p>I then changed it a bit, here are my blog templates:</p>
<p><a href="http://www.nott.org/wp-content/themes/nott/style.css">style.css</a><br />
<a href="http://www.nott.org/wp-content/themes/nott/index.php.txt">index.php</a><br />
<a href="http://www.nott.org/wp-content/themes/nott/page.php.txt">page.php</a><br />
<a href="http://www.nott.org/wp-content/themes/nott/comments.php.txt">comments.php</a></p>
<p>I also used <a href="http://www.asymptomatic.net">Owen Winkler</a>&#8216;s <a href="http://redalt.com/downloads/wp/codefilter.zip">Code Filter</a> plugin to properly display code, but had to slightly change it to:</p>
<p>[code lang="php"]<br />
function cf_callback($stuff)<br />
{<br />
	return "<br />
<blockquote{$stuff[1]}>".htmlspecialchars(clean_pre($stuff[2]), ENT_NOQUOTES)."</blockquote >";<br />
}</p>
<p>function cf_encode($content)<br />
{<br />
	return preg_replace_callback('|<br />
<blockquote([^>]*)>(.*)</blockquote >|imsU', 'cf_callback', $content);<br />
}</p>
<p>add_filter('the_content', 'cf_encode', '1');<br />
[/code]</p>
<p>Hope this saves someone the couple of days it has taken me to get <a href="http://www.wordpress.org">WordPress</a> set up properly to my liking <img src='http://www.nott.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.nott.org/blog/wordpress-install.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

