Tuesday 22 November 2016

Google bot

Sitemap

sitemap is a file where you can list the web pages of your site to tell Google and other search engines about the organization of your site content. Search engine web crawlers like Googlebot read this file to more intelligently crawl your site.

Googlebot:(user agent)

Googlebot is Google's web crawling bot (sometimes also called a "spider"). Crawling is the process by which Googlebot discovers new and updated pages to be added to the Google index.

        How Googlebot accesses your site:

                        Googlebot was designed to be distributed on several machines to improve performance and scale as the web grows. Therefore, your logs may show visits from several machines at google.com, all with the user-agent Googlebot. 

          Blocking Googlebot from content on your site

It's almost impossible to keep a web server secret by not publishing links to it. As soon as someone follows a link from your "secret" server to another web server, your "secret" URL may appear in the referrer tag and can be stored and published by the other web server in its referrer log.
If you want to prevent Googlebot from crawling content on your site, 
1)using robots.txt to block access to files and directories on your server.(for example, www.example.com/robots.txt)
2)you can use the nofollow meta tag. To prevent Googlebot from following an individual link, add the rel="nofollow" attribute to the link itself.
           

         Problems with spammers and other user-agents

The IP addresses used by Googlebot change from time to time. The best way to identify accesses by Googlebot is to use the user-agent (Googlebot). You can verify that a bot accessing your server really is Googlebot by using a reverse DNS lookup.

No comments:

Post a Comment