Robots.txt - Hooley

What is Robots.txt?

Robots.txt is a text file placed in the root directory of a website that instructs search engine crawlers which parts of the site they are allowed or not allowed to access. It follows a standard called the Robots Exclusion Protocol.

This file is typically used to manage crawler behaviour and prevent indexing of pages that are not useful for search engines or users, such as admin areas, duplicate pages, or staging environments.

Why Robots.txt Matters

A properly configured robots.txt file helps control how search engines interact with your site, which can improve crawl efficiency and protect sensitive or irrelevant sections from being indexed.

Benefits include:

Controlling crawler access to specific folders or files
Reducing server load by blocking unnecessary crawling
Preventing indexation of duplicate or non-public content
Enhancing security and privacy by restricting access to sensitive areas

However, blocking a page with robots.txt doesn’t guarantee it won’t appear in search results, especially if it’s linked from elsewhere. To fully prevent indexing, additional meta tags or HTTP headers may be needed.

Example in Use

A basic robots.txt file might look like:

User-agent: *
Disallow: /admin/
Disallow: /cart/
Allow: /

This tells all search engine bots (User-agent: *) not to crawl the admin and cart directories, but to crawl the rest of the site.

Related Terms

Bryn Roberts