Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file tells search engine crawlers which pages on your site they can and cannot request. It sits at the root of your domain (e.g. example.com/robots.txt) and uses a simple text format with User-agent, Allow, and Disallow directives.

Question 2

Can robots.txt block AI crawlers?

Accepted Answer

Yes. AI training crawlers like GPTBot, CCBot, Claude-Web, and Google-Extended respect robots.txt directives. This tool includes a 'Block AI Crawlers' preset that adds Disallow rules for all known AI training bots while keeping your site open to regular search engines.

Question 3

What AI crawlers does the tool know about?

Accepted Answer

The preset covers GPTBot (OpenAI), CCBot (Common Crawl), Claude-Web (Anthropic), Google-Extended, Bytespider (ByteDance), anthropic-ai, Applebot-Extended, cohere-ai, PerplexityBot, and Amazonbot.

Question 4

Does robots.txt guarantee pages won't be indexed?

Accepted Answer

No. Robots.txt controls crawling, not indexing. A page blocked by robots.txt can still appear in search results if other pages link to it. To prevent indexing, use a 'noindex' meta tag or X-Robots-Tag HTTP header.

Question 5

Where should I put my robots.txt file?

Accepted Answer

Place it at the root of your domain so it is accessible at https://yourdomain.com/robots.txt. It must be at the root path to be discovered by crawlers.

Robots.txt Generator - Build Robots.txt Online

About the Robots.txt Generator

Presets

How to Use

Frequently Asked Questions