Leave a Comment

Comments

1
Written by:Jeff Beck
Posted on:January 18, 2006 at 7:24 pm

I’ve been trying to do much of the same thing on a site I have been working on. The problem I have been running up against has been fSockOpen; it waits for a connection to the server before returning. As such, when you loop through an array of URLs, it can take maybe twice as long to connect and retrieve data as with cURL multi. The fastest solution I have seen involved fSockOpen connecting to the local machine (aka PHP multithread HACK), and then letting each PHPlet file do the crawling independantly. This cuts even the cURL multi in half, but is not very pratical on the server load (1 page = 30+ httpd files running). Speed wise, have you compared your PHP function with the cURL function?

2
Written by:HM2K
Posted on:June 1, 2006 at 2:26 pm

You never once actually said what this function does, “PHP Crawler” is VERY generic.

Also you have not shown any examples of output.

3
Written by:Mike
Posted on:September 13, 2006 at 9:40 am

If you don’t know what to do with the code, then it isn’t for you :-)

4
Written by:Rishiraj
Posted on:October 15, 2007 at 4:19 pm

I have freeernti hosting server with php 4.7.
Is there any alternative for me if i don’t want to use curl?

5
Written by:Mike
Posted on:October 15, 2007 at 4:25 pm

Rishiraj - use the ‘mycrawler_single’ function described in the post above. It should only need minimal tweaking to work in php 4.

6
Written by:Rays
Posted on:November 11, 2007 at 7:40 am

I’m confuse..
Somebody help me please?
Maybe, i must create auto crawler engine.
Hemm.. i need team work for this time!

7
Written by:David Arakelian
Posted on:February 17, 2008 at 6:05 pm

Thanks for sharing your code on the use of the cURL multi functions. I run a lot of scripts that take a long time to execute (4 - 5 days in some cases) because I am using a single stream. Using your code and some proxies I can probably get this down to a few hours :)

8
Written by:website designing Pakistan Peshawar
Posted on:March 19, 2008 at 8:41 pm

this is very good post, saving my alot of time. thanks, but iam still having problem in matching urls with php regular expression. i need a regx to match url in page.
thanks

9
Written by:Waleed GadElKareem
Posted on:May 6, 2008 at 3:44 pm

very useful thank you
please check the stream_set_blocking(), I used mode #2 for doing what you suggested