A web robot’s primary job is to scan websites and pages for information; they work tirelessly to collect data on behalf of search engines and other applications. For some, there is good reason to keep pages away from search engines. Whether you want to fine-tune access to your site or want to work on a development site without showing up Google results, once implemented the robots.txt file lets web crawlers know which parts they can collect information.
Create a Robots.txt File
As being one of the first aspects analyzed by crawlers, the robots.txt file can be implemented on a page(s) or an entire site to discourage search engines from showing details about your site. Through this article, we will be providing insight into how to use the robots.txt file as well as syntax needed to keep these bots at bay.
Let’s break down the code below “user-agent” pertains to the web crawlers and the * sign means all web crawlers. Consequently, the first line grabs attention by saying “Listen up all web crawlers!” We move onto our second line which lets the web crawler know its direction. The forward slash (/) stops the bots from searching all the pages on your site. You can also discourage information collected for one specific page, in this case, it is a map of our building layout. Since the design of our building does not need to searchable, with the command below, I can tell all bots to leave out the index of the buildinglayout.png photo, while keeping it viewable to any guest that want to view.
Contrary, if you would like for all search engines to collect information on all the pages in your site you can leave the Disallow section blank.
There are many types of web crawlers (aka user-agents) that can be specified. Below is a chart of the most popular web crawlers followed by as their associations. Furthermore, you can also instruct these bots to index a certain page by using Allow, as shown in the example below. You can implement these web crawlers within your robots.txt file like so:
Mostly, sites don’t automatically come with a robots.txt file (and isn’t required) so you can create one using a text editor and upload the file to your root directory or any other directory. Luckily, if you use the popular CMS, WordPress and its helpful SEO plugin Yoast, you’ll see a section within the admin window to create a robots.txt file.
Robots.txt File In WordPress
After logging into your WordPress backend (yourdomain.com/wp-login.php) locate the SEO section and select Tools. After clicking on the file editor link, you see a page that looks similar to the code used in the first of our article.
Our example keeps web bots from WordPress login page, including wp-includes directory while still allowing users and bots to see other pages of our site. Take note of the necessary ending slashes after the directory (but not needed when disallowing pages). After editing select the “save changes to robots.txt” button to activate the robots.txt file.