Showing posts with label google indexation. Show all posts
Showing posts with label google indexation. Show all posts

Monday, January 19, 2015

How Google Crawls and Indexes Web Pages

How your site ranks depends on how Google crawls and indexes your website. If you've never worried about the Google crawling and how it indexes your website, better late than never, you should get an idea of what this Google Crawling and Google indexing is all about.
Herein I provide you with some knowledge on what Google Crawling is, What Google Indexing is all about and what might be the reasons behind your website not being crawled or indexed by Google.
First let me tell you a few words or terms that you should know before moving further.

Google's Index:

Google's index is the list of all pages that Google has crawled and indexed so far. When someone searches for something on Google, the resulting information (in the form of pages) is pulled from Google's index. Around 40+ billion web pages are indexed by Google.

Fun Fact: "The Invisible Web" is something that has not escaped from Google, which is 90% of the entire web, making a count of 450 billion web pages.


Crawling is the process by which Google uses its programs called "SPIDERS" to index your site.
  • these spiders are created with the idea to browse the webpages like humans (the final consumers/ users)
  • they move from page to page and from link to link of the entire site
  • they try to find and index each and every page on the web
Crawls can happen several times a day or once in 6 months.

TIP: make sure to regularly update your content such that Google's spiders crawls your webpage more often.

7 most common reasons behind your website not being crawled by Google Spiders:

1. Incorrectly configured robots.txt file, which is used to block parts of the website from being accessed by Google's spiders.
The worst robots.txt file is that which resulted a website with just a single page being crawled and indexed while blocking all the other pages.
2.A badly configured configured .htaccess file, which is used to redirect URLs from one domain to another domain. If not optimized well, will results in loss of huge organic traffic, during redirection.
3. Incorrectly written meta tags, title and author tags.
4. Configuring URL parameters incorrectly in Google Webmaster Tools.
URL parameters can be configured in GWT so as to let Google know what dynamic URLs you don't want Google to index.
5. Low Page Rank - Page Rank is roughly proportionate to the number of pages Google crawls, as per Matt Cutts.
6. Any DNS or connectivity issues may cause Google's spiders from reaching your servers. So make sure maintaining high quality servers and hosting.
7. Domains used for a big link-spam farm or private link network or any other penalization worth schemes causes de-indexation from Google.

Ways to improve Google crawling and increase number of indexed pages:

  • Regularly check and fix any crawl errors that appear in Google Webmaster tools.
  • Make sure that AJAX application with content is crawlable and indexable.
  • Make sure to carefully add a proper and well optimized robots.txt file
  • Add sitemap  to the website.
