Easy Google RSS Scraping
So, for all those of you that use GYM to scrape for URLs to ::ahem:: study, and are forced to use their normal messy SERPs because their API only allows 10 results per query (or even those that use /ie or /palm).
I just noticed that on their Blog Search, Google allow you to grab an RSS feed of the results.
So now a simple little:
will get a whole load of URLs very simply.
Blogs also tend to have more content than web pages, which is even better for ::ahem:: studying.
But of course, it would help is Google was to have an XML feed of their normal SERPS like Yahoo & MSN do.
I found a bit of something for google, and it doesn’t cost any money, but they seemed to keep it well hidden. If you want to take a look at the code, see my blog: http://jamesfive19.com/blog/?p=30, scroll down to “Custom RSS feed”. The blog isn’t much yet, I still have a lot to do before it is really up. If you want to see it in action I have set up a page with a google custom feed: http://jamesfive19.com/blog/?p=34.
Better Google feeds can be had for money that are page specific and contain many more parameters, I have seen it described on the net somewhere, but this simple search may do for some purposes.
I’m looking for an easy (like for non-geeks) way to scrape other people’s pages to generate RSS feed where there is none. Especially with blogs you can see that there is an internal list being generated, I’m trying to create a feed to get me somebody else’s list of, say, recent articles. Do you know how to do it?