The history has demonstrated the evolution of immersive web generations. In 1980 Tim Berners-Lee was the first to identify the problem of information management so did he create the World Wide Web & made it royalty free for public usage. Since the commencement of first website in 1991, there is an increasing amount of web content that makes it more & more difficult to choose right content from trillions of web pages & so web crawler has been designed with an aspiration of getting highly desirable content.
The statistical analysis done by Internet World Stats reports estimate that there were 16 million web users in Dec-95 & it has been increased to around 3 billion till Dec-15. The most cautious prediction given by International …show more content…
LS crawler more efficaciously provides search by extracting keywords than the result displayed using semantics.
2. It complexifies the job of web crawler in identifying next important and specific link to follow.
2- Focused web crawler:
The focused crawler was acquainted to overcome the shortcomings of traditional crawlers such as problem due to high cost operating and small coverage of web. Hefty growth of the web, results in large index size which is not approving to find the intended focused resources. Therefore Focused crawler is indispensable to cope with this problem. The prospective applications of focused crawler are in finding linkage or relationship, locating most relevant sites, which forms learning basis for human.
The following section shows the architecture of focused crawler which contains following important functional blocks,
[i] Classifier: makes relevance judgments on pages crawled to decide on expanding links found in these pages.
[ii] Distiller: The measure of centrality of crawled pages can also be determined. This can be further used to realizing the priorities of visitor.
[iii] Crawler: It allows vigorously reconfigurable priority controlled by the classifier and distiller …show more content…
The advantage of using incremental crawler is that only the desired and valuable information and data is provided to the user. This also helps in reducing the requirement of network bandwidth simultaneously attaining the data enrichment.
4- Distributed crawler:
The implementation of distributed web crawling it makes use of distributed computing technique. Many crawlers are focused on achieving massive coverage of the web by using the distributed web crawling. The functions such as synchronization and the inter-communication is handled by a central server.
A central server is essential as crawler is geographically distributed. To obtain the efficiency and relevant search it uses the page ranking algorithms. The advantage of using distributed web crawler is that it withstand against the crashing of system and any similar events. It can be used in many crawling applications.
5- Parallel crawler:
The application or the system which requires implementing multiple crawlers it is important that they should run in parallel. These parallel working crawlers are referred as parallel crawler. This type of crawler needs multiple crawling processes called as C-procs. These processes can run on the network of