sirdf.com Forum Index sirdf.com
Search & Information Retrieval Development Forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Getting large lists of urls.

 
Post new topic   Reply to topic    sirdf.com Forum Index -> Making a search engine
View previous topic :: View next topic  
Author Message
runarb
Site Admin


Joined: 29 Oct 2006
Posts: 4

PostPosted: Fri Jan 27, 2006 11:20 am    Post subject: Reply with quote

Where can one get large list of urls?

Verisign has there “TLD Zone Access Program”, where one can get all the .com, .net and .edu domains. See http://www.verisign.com/information-servic...age_001052.html and http://www.verisign.com/information-servic...age_001051.html

Dmoz has a RDF dumb off there database of about 5 mill urls here http://rdf.dmoz.org/


Anyone knowing about any other sources?

_________________
CTO @ Searchdaimon company search.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
masidani
Member


Joined: 10 Jan 2006
Posts: 23

PostPosted: Sat Jan 28, 2006 1:03 pm    Post subject: Reply with quote

Sorry, but DMOZ is the only one that I know of - and probably the largest. I did once write a script to collect URLs off of the del.icio.us front page - it refreshes every few minutes, so in a relaively short while I collected a few thousand URLs (even with sleeps between fetches to be polite).

Simon
Back to top
View user's profile Send private message
zootreeves
Newbie


Joined: 10 Dec 2005
Posts: 8

PostPosted: Mon Jan 30, 2006 7:30 pm    Post subject: Reply with quote

What about using random keywords and Google's API to scrape urls
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    sirdf.com Forum Index -> Making a search engine All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group