Robots.txt vs Noindex: Deindex your site, the right way

Difference between robots.txt and noindexIf you are into SEO or anywhere related to it, you would be having these terms (robots.txt and noindex) somewhere in your vocabulary. And if that’s not the case you have came to right place to learn it. These tags are generally used to prevent search engines from crawling and indexing your site. Now you would be thinking why would anyone do that? Let me give you few instances where you can use these tags:

Where can you use it?

You can use for pages which are not important for search engines and you would not like to show them in search results. For eg:

  1. Admin section pages of a site.
  2. Deindex non converting pages and let search engines focus only on converting ones.
  3. Search Results pages.
  4. Error pages.

Now the question which arises is that what is the difference between them? So lets get to know “What are Robots.txt and Noindex and what is the difference“.

Crawling and Indexing

Before getting into detail of these tags I would like to explain you the “difference between Crawling and Indexing” as it will help you in understanding the concepts easily.
Crawling means if Google bot (computerized algorithms) visits a page in your site and reads the content inside it is called crawling.
Indexing refers to when Google saves your site address in its index (collection of site urls and information).


Robots.txt is basic text file which you upload in the root directory of your site. It can be found out at www.sitename/robots.txt and has instructions for search engines to follow. If you have used the term ‘Disallow’ for a particular directory or a page, the search engines understand that and will not crawl that page.

User-agent: *
Disallow: /wp-admin/
Disallow: /test/abc.html

In the above case search engines will not crawl the directory wp-admin and the page abc.html. What I mean by crawl is that they will not read the page but they might index it. For eg if some page has a link to abc.html then search engines might show this page in search results in rarest of cases (when there is no other relevant data to show) but it will show only the url without any description as it does not have any information because the page is not crawled. So using robots.txt assure you that your page will not be read by search engines but it does not guarantee you deindexing a site. Here is a video by Matt Cutts:

Note: 1) Robots.txt is Case Sensitive. That means you having an entry such as this:
Disallow: /thispage.html does not block /ThisPage.html. 
It will only block the exact match. Thus if you have Canonical Issues (the same content under variant URLs, including Case differences), then the chances are that you will have issues with robots.txt blocking successfully.

So if robots.txt is not able to de-index your site then how to do that? There are two ways of doing it one is “request for url removal” and other is “Noindex”.

Request removal of an entire page:
  1. Go to the Google public URL removal tool.
  2. Click New Removal Request.
  3. Type the URL of the webpage you want removed (not the Google search results URL or cached page URL). The URL is case-sensitive—use exactly the same characters and capitalization that the site uses.
  4. Click Continue.
  5. Click Remove this page.
Note: This will only remove the page from index of Google and it will not be indexed in search results but other search engines might show it. So the best way to deindex from major search engines is using “Noindex”.

Noindex is a meta tag that you put on the head section of your website. Unlike ‘Robots.txt’, the ‘Noindex’ allows search engines to read the pages but instructs them to remove it from memory that it was ever indexed. That means when the search engine comes to a page with noindex meta tag, it will continue to read the content inside it including the links (so link juice is passed) but will forget it after reading and will not index it. For eg:

<meta name="robots" content="noindex" />

This line of code in the header will prevent search engines form indexing this page. The drawback with this is that you need to put this code in all the page which you want to deindex, so it becomes difficult to manage if number of pages becomes too much. And the good thing is that it is supported by all the major search engines.

Which to use and when?

I would suggest you to use “Noindex” meta tag instead of “Robots.txt” if you want to deindex a page or directory from search engine records. There are two reasons for it, first being the page will be deindexed by search engines itself the next time your site is crawled and you do not need to do it manually (like sending a url removal request). The second reason being that it is not going to waste your Pagerank which is passed from a noindex page because it is read by search engines but not from a robots.txt page.

The issue with “Noindex” is that it has to be done page per page basis so it beocmes difficult to manage while “robots.txt” allows easier way of doing using a single file. So think over both of them and go with one which suits your requirement.

Things to Keep in mind

1) The page with a ‘Robots.txt’ will not be read by the search engine so any links on that page will not be crawled. This would not allow the link juice to pass and so the Pagerank gets wasted.

2) On the other hand the page with ‘Noindex tag’ will be read by the search engine and the link juice will be passed to consecutive pages (if its a dofollow link) so Pagerank is utilized since it is read by search engine but will not be indexed.

3) If you use robots.txt and the url removal from google, that will work, the page will get deindexed  but then Google will never crawl that page again and therefore not follow any of the links on that page. You are blocking their crawler so your site will not be crawled as thoroughly which means pages can be missed, a lower percentage of your pages will be indexed.

4) Disallowing a URL in robots.txt does NOT mean it will magically be removed from the Index. That’s what the URL Removal Request tool is for.

Some Related Stuff
  • NOINDEX tag tells Google not to index a specific page
  • NOFOLLOW tag tells Google not to follow the links on a specific page
  • NOARCHIVE tag tells Google not to store a cached copy of your page
  • NOSNIPPET tag tells Google not to show a snippet (description) under your Google listing, it will also not show a cached link in the search results
11 replies
  1. Venu SEO Specialist says:

    Of course the robots.txt and .htaccess files highly influence the seo rankings, but if the webmaster not use it properly, its show negative impacts. What you have suggested points are very useful to all. Thanks to share it admin.

  2. Willa says:

    whoah this weblog is excellent i like reading your posts.
    Stay up the great work! You know, a lot of individuals are hunting round for
    this info, you can help them greatly.

  3. Deepak Agnihotri says:

    i was not aware from Noindex tag before, but after reading this blog i have got this so important knowledge about it. I am so happy to read this blog. Please keep aware us like this points in future.

  4. Noman Muzaffar says:

    Nishant it was extremely helpful post. Understanding the difference between Crawal & Index is not that easy. You write a great blogpost. Keep it up guy.

  5. Russell says:

    Write more, thatts all I have to say. Literally, it seems as though you relied on the video to make your point.
    You clearly know what youre tapking about, why waste your ijtelligence on just posting videos to yolur site
    whrn yyou could be giving us somethig enlightening to read?

    Feel free too surf to my webpage – Information about China and Hong Kong on Wiki

  6. wp service says:

    Nishant you explain very well with the help of video..Robot.txt uses for seo purpose to hide the specific content from google crawler. Best of luck for future.


Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *