@ieij1

Efficient Crawling Through Dynamic Priority of Web Page in Sitemap

, and . Informatics Engineering, an International Journal (IEIJ), 02 (02): 01-11 (June 2014)

Abstract

A web crawler or automatic indexer is used to download updated information from World Wide Web (www) for search engine. It is estimated that current size of Google index is approx 8*109 pages and crawling costs could be around 4 million dollars for a full crawl if only considered network costs. Thus we need to download only most important pages. In order toward, we propose “Efficient crawling through dynamic page priority of web pages in Sitemap” which is query based approach to inform most important pages to web crawler through sitemap protocol in dynamic page priority. Through the page priority web crawler can find most important pages from any website and may just download them. Experimental results reveal our approach has better performance than existing approach.

Links and resources

Tags