Another must-have for every site is a robots.txt file. This should sit in the same place as your sitemaps.xml file. The address to this file should look the same as the example below: http://www.yoursite.com/robots.txt The robots.txt file is a simple file that exists so you can tell the areas of your site you don’t want Google to list in the search engine results.
There is no real boost from having a robots.txt file on your site. It is essential you check to ensure you don’t have a robots.txt file blocking areas of your site you want search engines to find.
The robots.txt file is just a plain text document, its contents should look something like below: robots.txt good example User-agent: * Disallow: /admin User-agent: Disallow: /logs If you want your site to tell search engines to not crawl your site, it should look like the next example. If you do not want your entire site blocked, you must make sure it does not look like the example below. It is always a good idea to double check it is not set up this way, just to be safe.
The forward slash in this example tells search engines their software should not visit the home directory. To create your robots.txt file, simply create a plain text document with Notepad if you are on Windows, or Textedit if you are on Mac OS.
Make sure the file is saved as a plain text document, and use the ‘robots.txt good example’ as an indication on how it should look. Take care to list any directories you do not want search engines to visit, such as internal folders for staff, admin areas, CMS back-end areas, and so on.
If there aren’t any areas you would like to block, you can skip your robots.txt file altogether, but just double check you don’t have one blocking important areas of the site like the above example.
Duplicate content—canonical tags and other fun.
In later chapters I will describe how Google Panda penalized sites with duplicate content. Unfortunately, many site content management systems will sometimes automatically create multiple versions of one page. For example, let’s say your site has a product page on socket wrenches, but because of the system your site is built on, the exact same page can be accessed from multiple URLs from different areas of your site
page are considered duplicate content. To account for this, you should always ensure a special tag is placed on every page in your site, called the ‘reel canonical’ tag. The reel canonical tag indicates the original version of a web page to search engines. By telling Google the page you consider to be the ‘true’ version of the page into the tag, you can indicate which page you want listed in the search results.
Choose the URL providing the most sense to users and the best SEO benefit, this should usually be the URL that reads like plain English. Using the earlier socket wrenches example, with the tag below, Google would be more likely to display the best version of the page in the search engine results.
As a general rule, include this tag on every page on your site, shortly before the tag in the code