All Posts

What is Robots.txt? How to create SEO standard Robots.txt file for website

seomarketingSeptember 14, 2025·#Seo Marketing

Robots.txt is an important file in SEO that helps manage crawl budget and avoid duplicate content. See detailed instructions from Tan Phat Digital on how to create SEO-standard robots.txt file.

What is Robots.txt? How to create SEO standard Robots.txt file for website

What is Robots.txt?

Robots.txt is a simple text file located in the root directory of your website (for example: https://example.com/robots.txt). This file is used to instruct search engines like Googlebot how to crawl and index content on the website.

Simply understood, robots.txt is like an instruction board for bots: where to go, where not to go. Thanks to that, you can control data collection activities, avoid wasting resources and optimize SEO effectiveness.

For example:

  • You want Google not to crawl shopping cart pages, internal search results or heavy PDF files → you can use robots.txt to block.

  • On the contrary, you want the bot to focus crawl service, product, main article pages → to open access.

The role of Robots.txt in SEO

A website can have thousands of URLs, but not all URLs are important for SEO. At this time, robots.txt acts as a data filtering tool, helping Google focus on crawling the most valuable content.

1. Save crawl budget

Googlebot has a certain limit on the frequency and number of pages that the bot can crawl on each website. If you let bots wastefully crawling on less valuable URLs (e.g. /search/, /cart/, /tag/), more important pages may be slow to index.

2. Avoid duplicate content

URLs with parameters, filters, session IDs... easily create duplicate content. Robots.txt can block bots from accessing these URLs, making the website cleaner and more focused.

3. Technical SEO support

In technical SEO (technical optimization for websites), robots.txt is one of the core files along with sitemap.xml, .htaccess, canonical tag... If robots.txt is missing or misconfigured, the website can be indexed to unwanted pages or miss important pages important.

👉 If you want to learn more about technical optimization, please refer to the article: What is Technical SEO? Checklist Technical SEO Website.

4. Not a security tool

Note: robots.txt does not secure the website. Blocked pages can still be accessed if someone knows the direct URL, and sometimes still appear on Google if there is a link from another website. To completely prevent indexing, you must use noindex meta tag or X-Robots-Tag in the HTTP header.

Basic structure of Robots.txt file

A robots.txt file usually consists of 4 main components:

User-agent: [bot name]
Disallow: [path blocked]
Allow: [allowed paths]
Sitemap: [XML sitemap URL]

Example standard file:

User-agent: Googlebot
Disallow: /private/

User-agent: *
Allow: /

Sitemap: https://www.example.com/sitemap.xml

Explanation:

  • User-agent: Applicable search bots (e.g. Googlebot, Bingbot).

  • Disallow: Block bots from accessing specific paths.

  • Allow: Allow bots to access, even in blocked folders block.

  • Sitemap: Declare the sitemap URL to support indexing.

Principles for creating SEO-standard Robots.txt

  1. Put it in the right location: the robots.txt file must be in the root directory (https://domain.com/robots.txt).

  2. Use correct name: must be robots.txt (with s). Many people mistakenly say robot.txt is wrong.

  3. Write the syntax correctly: misspellings or extra spaces can cause bots to skip the file.

  4. Do not abuse Disallow: if you block the wrong important folder (like /blog/, /services/) → your website will be lost. index.

  5. Declaring Sitemap: helps bots understand the structure and prioritize crawling important content.

  6. Regular testing: use Robots.txt Tester tool in Google Search Console to test.

Important notes when using Robots.txt

  • Do not replace Noindex: Robots.txt only controls crawling, does not guarantee index blocking. If the page has been crawled from another source, it may still appear on Google.

  • Be careful with SEO plugins: If you use Yoast SEO, RankMath or All in One SEO, you may create fake robots.txt. At this time, there is no need to upload files to the server.

  • Check for indexing problems: If the website has a status of not indexing new posts, see if robots.txt is blocking it by mistake. You can refer to the article: Why doesn't Google index the article? The fastest way to fix.

Practical example Robots.txt for website

1. News website/blog

User-agent: *
Disallow: /wp-admin/
Disallow: /search/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/sitemap.xml

2. E-commerce website

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /search/
Allow: /
Sitemap: https://www.example.com/sitemap.xml

3. Service business website

User-agent: *
Disallow:
Allow: /
Sitemap: https://www.example.com/sitemap.xml

Robots.txt and SEO strategy in Vietnam

The SEO market in Vietnam has some characteristics:

  • E-commerce websites often have many dynamic URLs (price, color, size filtering). If not blocked properly → duplicate content.

  • Service websites usually have few pages, but can easily lose index if blocked incorrectly.

  • News/blog websites easily generate many search URLs, tags, categories → need to optimize robots.txt to save crawl budget.

Important thing: robots.txt is not only for “bot prevention”, which needs to be combined with content, website structure, sitemap and internal links. If you are implementing SEO, please see the article: Basic website SEO - 6-month practical checklist to plan synchronously.

Tan Phat Digital - companion to standardize technical SEO

Article This article was developed by Tan Phat Digital (https://tanphatdigital.com/), where we focus on comprehensive SEO solutions, including technical SEO, content strategy, and standard website design so that small and medium businesses can deploy effectively and sustainably. If you want advice on standard robots.txt for your website, do not hesitate to contact us for detailed support.

Robots.txt is a basic but extremely important file in technical SEO. It helps you control crawl budget, prevent duplicate content, support sitemaps and technical SEO. But it is not a security tool, nor does it replace noindex or canonical tags. For effective SEO, you need to combine robots.txt with other factors such as sitemap.xml, canonical tags, quality content and clean website structure.

Share

Comments

0.0 / 5(0 ratings)

Please login to leave a comment.

No comments yet. Be the first to share your thoughts.