The robots.txt file has just blown the candles on its 25th birthday cake. Installed at the root of a website by its creator, it generates detailed directions to spiders, automated crawlers that analyze a page within search engines. It’s a proper instruction file, customizable by the editor of a website, which allows webmasters to allow – or on the contrary, block – access to the various software that manually scans the network.
Created in 1994 by Martijn Koster, the Robot Exclusion Protocol is therefore of crucial importance for various analysis and SEO related reasons. Yet, so far, it has never become a global regularized standard in the network. All of this has led, over the years, to the creation of several variants that complicate the operating circumstances, both for webmasters and crawler owners, which have been added to the discrepancies due to the dozens of system updates, which create new rules to understand and manage case by case.
Recently, Google itself spoke about it: it wants to make the REP a real must of the world wide web that will come. A quarter of a century after its invention, Big G decided to call into question Koster, the Dutch developer who invented the protocol, to work on a ground rule. The draft of the new system ready to replace the previous version, as announced through an official statement, «reflects over 20 years of real world experience of relying on robots.txt rules, used both by Googlebot and other major crawlers, as well as about half a billion websites that rely on REP. These fine-grained controls give the publisher the power to decide what they’d like to be crawled on their site and potentially shown to interested users».
Chances are we will soon count on a proper standard, that will be inserted into the .txt file, as opposed to the restricted allow: and disallow: current rules. Something will then change within the alphabet through which the REP is encoded, defining a language that can be compatible with each situation. Google has surrounded itself with developers able to collect comments and suggestions, with a campaign spread through the dedicated channel Google Webmasters, before taking definitive action towards the decision that will lead to the turning point.
The robots.txt standardization should become something real pretty soon. We just have to wait and see how.