A lot of people mistake indexing for web positioning. While it’s true that both concepts are heavily related, the storing of sites in the catalogue (equivalent to a cache) of a search engine (Google, to be exacts) by its own doesn’t reward nor penalty its ranking in the SERPS of the different key-words which is confronting.
The Google bot daily tracks about a pair of pages (minimum) to hundreds of them for each domain.
This frequency depends on the published pages’ amount and the site’s authority.
First things first, the bot has to have access to the pages we want to index and get the thing to know them. To do that we’ll use the famed sitemap.xml to point out and form a list of all of our pages.
Despite that the bot will naturally navigate through all links which it finds on each page: and that’s where trouble knocks at the door.
All pages which aren’t explicitily specified that they aren’t to be indexed will be requested by Google and stored on their catalogue. That doesn’t mean each of them is candidate to come out in some search result.
How Goggle realizes we don’t wanna index (or remove index) of a page
To tell Goggle which pages mustn’t be indexed by the bot there are two manners: use a head (inside of html/head) with meta/bots having the value noindex:
<meta name="robots" content="noindex">
or else add a line on the file robots.txt having disallow: [page URL]. Example:
User-agent: * Disallow: /search engine
A header 301 leading to another URL also results on the original URL’s indexing removal.
Problems linked to the tag meta/noindex
The biggest difference between telling Gogle not to index a page or remove the indexing from a meta tag and from robots.txt is, that, in the first case, the Google robot must download the page and analyze it until it reaches that conclusion.
If we do it like that then we’re both giving Google extra work to be done and we’re also taking up space on its catalogue which, sooner or later, if the “noindex” tag is there, will be eliminated: but as long as it’s there then we’re stealing the spot to another page.
Besides: if the page which has a meta/noindex keeps on being linked (I’ve seen cases in which this was included even in the sitemap.xml) then the robot will be forced to, from time to time, come in and check it out. Another possibility is that the page has a rel=”nofollow” tag but that keeps on being more laborous than just adding the necessary entry in robots.txt
Candidates sites to be denied indexing
It’s reccomended that the pages which are poor in content or which has it duplicated shouldn’t be indexed. The main reason is that, even if they’re indexed, Google isn’t ranking it and will take up space on the domain catalogue, which is limited and will depend on the authority Google grants the domain.
Example: a store’s site has a listing paging (of both products and/or articles) which they generally don’t have content and if they have it I bet that it’s a copy-paste carried over from the home page: this would another type of page which I wouldn’t index myself.
The results of internal searches shouldn’t be indexed either but I’d keep a close eye to some of the results which can grant you a good long-tail ranking: but that’s another tale altogether..
ID the pages which are poor in content or which aren’t interesting for the users and add them in robots.txt
Robots.txt is a text file but that doesn’t need to be manual: you can use bits of the Apache system Rewrite combined with PHP or Ruby to generate it live and be able to use business criteria to deny the pages you want to.
If you use disallow in robots.txt instead of meta/noindex you’ll see how all your sites will be indexed faster and the Google bot won’t be wasting time checking pages which you don’t want to be indexed. Without minding that, at server level, there ALWAYS is an associated cost to serving a page.
You’ll see how indexing grows in quality and quantity.
I expect comments, suggerences, opinions and, if you liked it, some beer to keep on talking ‘bout the topic…;) If you need a team expert in online marketing, SEO & iPad app development, and such then contact us!