The robot text file, better known as robots.txt, is a long-running Web standard which helps prevent Google and other search engines from accessing parts of your site.
Why would you want to block Google from parts of your site? One important reason is to prevent Google from indexing pages on your site which are duplicates of pages on other sites—such as the default WordPress pages. Google penalizes sites with duplicate content.
Another important reason is to prevent Google from linking to unprotected premium content on your website. For example, maybe you give out a free ebook to people who subscribe to your mailing list. You don’t want Google to link directly to this ebook, so you use the robot text file to prevent Google from indexing it.
For example the ebooks might be stored in the folder in your root domain called PDF. This is what you would do to block all search engines.
User-Agent: *
Disallow: /PDF/
On the other hand, if you want your free book to go viral, don’t block the search engines from the book.
Some people also like to prevent Google from using their images in Google search or from downloading large files.
Also, if you have a large authority WordPress site, Google may be loading the same page under several different names, using up a large part of your bandwidth and webserver computer processing power. Special robot text file patterns can tell Google to only access pages once.
Finally, you can tell Google about your XML or text site map using robots.txt, so it indexes new pages on your site much faster than just waiting for it to re-crawl your site.
Robot Txt File Basics
The robot text file is an optional file in the root directory of a website. Since you’re reading this, I assume you have a website. Take a moment to see if you already have a robot text file by going to the following URL: http://example.com/robots.txt
(Replace example.com with your domain name.)
Here is mine: Please note, it is a work in progress. I recently changed my WordPress theme which also required that I do some of my own robot text file editing.
You must be careful when editing this file and you can easily make a mistake and block the search engines from accessing your website.
If you get a 404 File Not Found error, you don’t have a robot text file. Otherwise, you will see a simple text file with lines labeled User-Agent, Allow, Disallow, and Sitemap, plus blank lines, and comment (“#”) lines.
What Things Mean in the Robot Text File
• User-Agent means the user agent of the Web browser visiting your site. The robot text file only apples to robots—also called spiders—who crawl your website for search engines and other automated online tools. Google’s crawler robot is called Googlebot, although Google also has a few other robots for its other search tools.
• Allow tells robots that they’re allowed to visit URLs containing a particular path. Most robot text files tell robots the the root (“/”) path is ok to crawl.
• Disallow tells robots where they cannot go. Most of your time editing a robots.txt file will be spent crafting disallow lines.
• Sitemap points to your site map (or multiple sitemaps if you have a large site). You need a sitemap to use this, which requires something like the WordPress plugin XML Sitemap Generator.
Getting Your Robot Text File In WordPress
The following instructions will only work if you use WordPress to manage the root directory of your website. That means the main page of your blog doesn’t have any words in it after the domain name.
For example, if your main WordPress page is http://example.com/, then WordPress probably manages your robots.txt file. But if your main WordPress page is http://example.com/blog, then WordPress probably doesn’t manage your robots.txt file and you’ll have to work on it directly using FTP upload.
By default, WordPress will create a restrictive robots.txt file if you use the WordPress settings to mark your blog as private. Most people have public sites, so the default WordPress robot text file is empty.
Some website hosting companies provide a default robot text file for WordPress—especially if you used a one-click install for WordPress. If so, you may need to edit your robots.txt file using FTP upload too.
But if none of the above is the case, you can probably have WordPress generate your robots.txt file for you.
Robots.txt WordPress Plugins
Several SEO plugins can generate a robots.txt file. I’d be careful using these if you do anything besides blogging with your site because they can stop Google from indexing legitimate pages. This can be one of those silly errors that cause your website rankings to drop fast.
Another plugin which automatically creates the robot text file is XML Sitemap Generator. It doesn’t block or allow anything—it simply includes a Sitemap line to tell Google and other search engines where to find your sitemap.
An example would be:
Sitemap: http://tips4pc.com/sitemap.xml
There’s also a very old WordPress plugin which lets you edit your robot text file from within WordPress. I haven’t used this plugin, so I don’t know if it still works.
The Old-Fashioned Robots.Txt File Editor
If you want a custom robot text file, you can create one the old fashioned way. Open Windows Notepad, Mac OSX TextEdit, or vi or emacs for Linux. Enter the following text:
User-Agent: *
Allow: /
The example file above will tell robots to act exactly like they would if you didn’t have a robot text file, so it won’t break anything on your site. Save the file as robots.txt and upload it to your webserver’s root directory using an FTP tool or your website hosting company’s online file manager.
(The root directory is the same directory where you add the Google website verification code file, in case you’ve done that before.)
After the file is uploaded, use your Web browser to visit http://example.com/robots.txt (but use your domain instead). You should see the file you just uploaded. If you don’t, you will need to contact your hosting company for help.
What To Put In Your WordPress Robot Txt File
Your robot text file can be as simple as the example above or much more complicated. In general, you want to block the following:
• WordPress login and help directories, which all start with wp. Put this code under “allow: /”
Disallow: /wp-*
• The example above will tell Google not to index the WordPress uploads directory where you store your images. If want your images to appear on Google and Bing image search, add the following code:
Allow: /wp-content/uploads
• If Google tries to index a trackback, it will just get an error page, so add this code too:
Disallow: */trackback
• If you are using Google Adsense then it is recommended that you use this line to allow Google to crawl all content so they can serve targeted ads.
User-agent: Mediapartners-Google*
Allow: /
Those simple robot text file commands should cover the most important parts of your site, but if you want more ideas, go to your favorite WordPress-based website and look at their robots.txt file.