Faster and Efficient Web Crawling with Parallel Migrating Web Crawler

Singh, Akansha; Singh, Krishna Kant
May 2010
International Journal of Computer Science Issues (IJCSI);May2010, Vol. 7 Issue 3, p28
Academic Journal
A Web crawler is a module of a search engine that fetches data from various servers. Web crawlers are an essential component to search engines; running a web crawler is a challenging task. It is a time-taking process to gather data from various sources around the world. Such a single process faces limitations on the processing power of a single machine and one network connection. This module demands much processing power and network consumption. This paper aims at designing and implementing such a parallel migrating crawler in which the work of a crawler is divided amongst a number of independent and parallel crawlers which migrate to different machines to improve network efficiency and speed up the downloading. The migration and parallel working of the proposed design was experimented and the results were recorded.


Related Articles

  • 404 File Not Found: Citing Unstable Web Sources. Griffin, Frank // Business Communication Quarterly;Jun2003, Vol. 66 Issue 2, p46 

    Researchers, including students, must accommodate to the mutating character of hyperlinks on the World Wide Web. A small study of citations in three volumes of BCQ demonstrates the phenomenon of ‘URL rot,’ the disappearance of sites cited in the sample articles. Digital technology...

  • Search engines can't keep up. Kleiner, Kurt // New Scientist;07/10/99, Vol. 163 Issue 2194, p11 

    Explains how the growth of the World Wide Web has outpaced efforts by search engines to index the new sites. Number of Web sites, and the percentage covered in the best search engines, in 1999 versus 1997; Number of servers; Effectiveness of a metasearch, with almost 50% of the Web covered by...

  • Search Engines With a Soul. McCracken, Harry // PCWorld;Jul2000, Vol. 18 Issue 7, p43 

    Discusses innovative Internet search engines which make use of the human brain. People-driven searching, which allows visitors to ask questions in plain English and post them on a message board so that others can respond; Live representatives which assist in searches; Sites which offer this...

  • Web Switch.  // Network Dictionary;2007, p524 

    A definition of the term "Web Switch" is presented. It functions by routing traffic to the appropriate Web server based on the Uniform Resource Locator (URL) or Internet Protocol (IP) address of the request. Other terms similar to Web switch are URL Switch, Web content switch, content switch,...

  • Bye! (Sigh...) The End of an Era. Quint, Barbara // Information Today;Dec2002, Vol. 19 Issue 11, p8 

    Announces that Barbara Quint, author of the column 'Quint's Online,' is writing her last column in this issue of 'Information Today.' History of her column; Goals established when she started including letting online information industry know what professional searchers thought of their...

  • A Design and Implementation Model for Web Caching Using Server "URL Rewriting". Saleh, Mostafa E.; Nabi, A. Abdel; Mohamed, A. Baith // World Academy of Science, Engineering & Technology;Dec2009, Issue 36, p303 

    In order to make surfing the internet faster, and to save redundant processing load with each request for the same web page, many caching techniques have been developed to reduce latency of retrieving data on World Wide Web. In this paper we will give a quick overview of existing web caching...

  • How big is the net? Barras, Colin // New Scientist;5/2/2009, Vol. 202 Issue 2706, p30 

    The article discusses the sheer scale of the Internet. It has been estimated by Google that the internet contained some 5 million terabytes of data. Recent data suggest that well over 1 billion people rely on computers to access the internet. In July 2008, web surfers were introduced to...

  • A SCALABLE DISTRIBUTED SEARCH ENGINE FOR FRESH INFORMATION RETRIEVAL. Sato, Nobuyoshi; Uehara, Minoru; Sakai, Yoshifumi // Proceedings of the IADIS International Conference on WWW/Interne;Jan2003, p877 

    We have developed a distributed search engine, Cooperative Search Engine (CSE) to retrieve fresh information. In CSE, a local search engine located in each web server makes an index of local pages. And, a Meta search server integrates these local search engines to realize a global search engine....

  • Chapter 11: Finding information on the Internet.  // Exploiting IT in Business;9/1/1999, p99 

    Chapter 11 of the book "Exploiting I.T. in Business" is presented. Governments, academic institutions and individuals produce free information in the Internet which can be useful for businesses. The most common method of organizing and delivering information over the Internet is the World Wide...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics