Tuesday 22 November 2016

Web Crawling



World Wide Web:


The World Wide Web (abbreviated WWW or the Web) is an information space where documents and other web resources are identified by Uniform Resource Locators (URLs), interlinked by hypertext links, and can be accessed via the Internet
.English scientist Tim Berners-Lee invented the World Wide Web in 1989. He wrote the first web browser computer programme in 1990 while employed at CERN in Switzerland.


Internet:

The Internet is the global system of interconnected computer networks that use the Internet protocol suite (TCP/IP) to link devices worldwide


 Uniform Resource Locator (URL):

A Uniform Resource Locator (URL), commonly informally termed a web address (a term which is not defined identically)[1] is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it.

Internet bot:

An Internet bot, also known as web robot, WWW robot or simply bot, is a software application that runs automated tasks (scripts) over the Internet. . The largest use of bots is in web spidering (web crawler), in which an automated script fetches, analyzes and files information from web servers at many times the speed of a human.

Web indexing (or Internet indexing):

It refers to various methods for indexing the contents of a website or of the Internet as a whole. Individual websites or intranets may use a back-of-the-book index, while search engines usually use keywords and metadata to provide a more useful vocabulary for Internet or onsite searching

web search engine :

A web search engine is a software system that is designed to search for information on the World Wide Web.
The search results are generally presented in a line of results often referred to as search engine results pages (SERPs)

 



Web Crawling

Web Crawling is the process of search engines combing through web pages in order to properly
 index them. These “web crawlers” systematically crawl pages and look at the keywords contained
on the page,
the kind of content, all the links on the page, and then returns that information to the search engine’s
 server for indexing. Then they follow all the hyperlinks on the website to get to other websites.
When a search engine user enters a query, the search engine will go to its index and return the most
 relevant search results based on the keywords in the search term. Web crawling is an automated
 process and provides quick, up to date data.

Web crawler :
A Web crawler is an Internet bot which systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
Web search engines and some other sites use Web crawling or spidering software to update their web content or indices of others sites' web content. Web crawlers can copy all the pages they visit for later processing by a search engine which indexes the downloaded pages so the users can search much more efficiently.


1 comment: