How to Create Robots.txt: Allow and Disallow Rules

Search engine optimisation (SEO) is more than just writing great content or getting other websites to link to yours. It also includes managing how search engines like Google find and organise the information on your site. One key tool for this is something called the robots.txt file. The simple text file that helps website owners tell search engines which parts of their site they want crawlers (the programs that discover and index web pages) to visit, and which parts they want them to stay away from.

What is a Robots.txt File?

A robots.txt file is a basic text file that you put in the main directory of your website. It gives instructions to search engine crawlers, which are the digital tools that help search engines discover your web pages and how to interact with your site. Specifically, it tells them which pages or sections of the website should be visited and which ones should not be touched.

For instance, when crawlers like Googlebot or Bingbot come to your site, they first check the robots.txt file. If there are certain areas you don’t want them to access, the bot will respect those instructions and avoid those pages.

A robots.txt file is usually located at: "https://www.example.com/robots.txt"

Why Robots.txt is Important for SEO

The robots.txt file helps website owners manage crawl activity on their site. It is especially useful for large websites with many pages.

Some common uses include:

Blocking duplicate pages
Preventing the indexing of admin or login areas
Saving crawl budget
Controlling access to sensitive directories

However, it is important to use robots.txt carefully because incorrect rules can block important pages from search engines.

Basic Structure of Robots.txt

A robots.txt file contains two main directives:

User-agent
Allow / Disallow

Example of a Basic Robots.txt File

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Explanation:

User-agent *: means the rule applies to all search engine bots
Disallow /admin/: blocks bots from accessing the admin folder
Disallow /private/: blocks private pages
Allow /public/: permits crawling of the public folder

Allow, and Disallow Rules Explained

Disallow Rule: The Disallow rule tells search engines not to crawl certain pages or directories.

Example
User-agent: *
Disallow: /wp-admin/

This prevents search engines from crawling the WordPress admin section.

Blocking a Specific Page
User-agent: *
Disallow: /login.html

This blocks crawlers from accessing the login page.

Blocking an Entire Website
User-agent: *
Disallow: /

This prevents all search engines from crawling the entire site.

This rule should only be used for development or staging websites, not for live sites.

Allow Rule: The Allow rule is used to permit search engines to crawl specific pages within a blocked directory.

Example

User-agent: *
Disallow: /images/
Allow: /images/public-image.jpg

Explanation:

All files inside /images/ are blocked.
But the file public-image.jpg is allowed for crawling.
This rule is useful when you want to block a folder but allow specific files inside it.

More Practical Robots.txt Examples

Example 1: Blocking Admin and Login Pages

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /dashboard/

This helps protect backend sections from being crawled.

Example 2: Allowing Only Specific Bots

User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /

Explanation: Only Googlebot can crawl the website. All other bots are blocked.

Example 3: Blocking File Types

User-agent: *
Disallow: /*.pdf$
Disallow: /*.doc$

This blocks search engines from crawling PDF and DOC files.

Example 4: Blocking Query Parameters

User-agent: *
Disallow: /*?replytocom=

This is commonly used in WordPress websites to prevent crawling of duplicate comment URLs.

Steps to Create a Robots.txt File

Creating a robots.txt file is simple and can be done in a few steps.

Step 1: Open a Text Editor. Use any basic editor, such as:

Notepad/Notepad++
VS Code

Step 2: Write the Rules

Add your User-agent and Allow/Disallow directives.

Example:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /

Step 3: Save the File

Save the file as: "robots.txt". Make sure it is saved as a .txt file and not .txt.txt.

Step 4: Upload to Root Directory. Upload the file to your website root folder using:

cPanel File Manager
FTP
Hosting control panel
Location example:
public_html/robots.txt

Step 5: Test the File: Use Google Search Console Robots.txt Tester to ensure the rules work correctly.

Benefits of Robots.txt

Better Crawl Budget Management: Search engines have a limited time to crawl a website. Robots.txt helps focus crawling on important pages instead of unnecessary ones.

Prevents Duplicate Content Issues: Blocking duplicate URLs or parameters helps improve SEO performance.

Protects Sensitive Areas: Admin panels, internal directories, and login pages can be blocked from search engine bots.

Faster Website Crawling: By preventing bots from crawling unnecessary files (such as scripts or temporary folders), the crawling process becomes faster and more efficient.

Improves Website Organisation: Using robots.txt helps maintain a clean and structured website for search engines.

Disadvantages of Robots.txt

It Does Not Guarantee Privacy: Robots.txt only instructs bots, but it does not enforce restrictions. Some bots may ignore the rules.

Pages Can Still Appear in Search Results: If another website links to a blocked page, search engines may still index the URL without crawling it.

Incorrect Rules Can Harm SEO: A simple mistake like:

Disallow: /

can accidentally block the entire website from search engines.

Not a Security Tool: The robots.txt tool should not be used to hide sensitive information. Instead, use password protection or server-level security.

Best Practices for Robots.txt

To use robots.txt effectively, follow these best practices:

Always test the file before publishing
Avoid blocking important pages
Use comments to explain rules

Example:

# Block admin pages
User-agent: *
Disallow: /admin/
# Allow public content

Allow: /blog/
Keep the file simple and organised
Regularly review it during SEO audits

Conclusion: The robots.txt file is an important tool for managing how search engines explore your website. It allows website owners to set guidelines on which pages they want search engine bots to visit and which ones to skip.

When used correctly, robots.txt can make it easier for search engines to find the key parts of your site, help avoid issues with duplicate content, and improve your site’s overall performance in search results. However, it’s crucial to use it carefully, as incorrect settings might accidentally prevent search engines from accessing important pages.

Learning how to create and manage a robots.txt file is a valuable skill for anyone involved in running a website, whether you’re focused on marketing, development, or just want your site to perform better in search results.

How to Create Robots.txt: Allow and Disallow Rules

What is a Robots.txt File?

Why Robots.txt is Important for SEO

Basic Structure of Robots.txt

Allow, and Disallow Rules Explained

Steps to Create a Robots.txt File

Benefits of Robots.txt

Disadvantages of Robots.txt

Best Practices for Robots.txt

Posted by: Admin

Post a Comment

0 Comments

Disclaimer

Blog Archive

SEO Tools

Contact form

Blog Feed

Tags

Facebook

Company Pages

SEO Tools

Popular Post

How to Create Robots.txt: Allow and Disallow Rules

What is a Robots.txt File?

Why Robots.txt is Important for SEO

Basic Structure of Robots.txt

Allow, and Disallow Rules Explained

Steps to Create a Robots.txt File

Benefits of Robots.txt

Disadvantages of Robots.txt

Best Practices for Robots.txt

Posted by: Admin

You may like these posts

Post a Comment

0 Comments

Disclaimer

Blog Archive

SEO Tools

Contact form

Blog Feed

Tags

Facebook

Company Pages

SEO Tools

Popular Post