Check the website's robots.txt file
Free tool to check and validate your website's robots.txt file online. Detect syntax errors, missing User-agent directive, invalid Sitemap URL, Crawl-delay in wrong format. Automatically fetch robots.txt from URL or paste content for offline testing. Display detailed results with success/warning/error for each line.
Robots.txt is a text file placed at the root that of the website (example.com/robots.txt) to instruct search engine bots (Googlebot, Bingbot...) which pages to crawl and which pages don't to crawl. Incorrect syntax of a robots.txt file can cause serious consequences: Google don't able to crawl the website (if Disallow: / false), Google is crawling pages don't supposed to be crawled (admin, private pages), Sitemap don't found. The Robots.txt Validator tool helps you check the syntax and logic that of the robots.txt file BEFORE deploying, ensuring there are no errors affecting SEO and crawling.
Robots.txt is a text file placed at the root of the website (example.com/robots.txt) according to Robots Exclusion Protocol. It instructs search engine bots (crawlers) which URLs to crawl and which URLs not to crawl. This is the 'gentleman agreement' - bots can ignore it but major search engines obey it.
NOT at all. Disallow only prevents bots from CRAWLing the page (not reading the content), but if there are links from another page pointing to it, Google can still INDEX that URL (shown in search results with 'No information available'). To completely block indexing, use the noindex meta tag or X-Robots-Tag header.
Not required but RECOMMENDED. Adding 'Sitemap: https://example.com/sitemap.xml' helps search engines find the sitemap faster, especially for new websites. However, you SHOULD also submit your sitemap via Google Search Console just to be sure.
ARE NOT. Google does NOT follow Crawl-delay in robots.txt. To adjust Google's crawl rate, use Google Search Console > Settings > Crawl rate. Crawl-delay only works with some other bots such as Bing and Yandex. The value is the number of seconds between requests.
User-agent: * applies rules to ALL bots. You can specify rules for each bot: User-agent: Googlebot (Google only), User-agent: Bingbot (Bing only). Specific rules will override general rules. If there is no User-agent, the robots.txt file is invalid.
Disallow: /path/ prevents bots from crawling URLs starting with /path/. Allow: /path/exception/ allows crawling exceptions in disallowed paths. Allow is useful when you want to block folders but allow some files. For example: Disallow: /admin/ + Allow: /admin/public/ = block /admin/ but allow /admin/public/.
* match any sequence of characters. $ match end of URL. For example: Disallow: /*.pdf$ blocks all URLs ending in .pdf. Disallow: /*/private/ block /a/private/, /b/private/... Note: not all bots support wildcards, but Google and Bing do.
Directives (User-agent, Disallow, Allow) are case-insensitive. But URL paths are case-sensitive on most servers. /Admin/ and /admin/ are different. Best practice: match exact cases of URLs on the website.
We not only design websites, but also help businesses build strong digital brands. Providing comprehensive website design services from design to SEO optimization. Please contact Mavis Digital immediately to create breakthrough, effective and sustainable technology solutions for your business in Ho Chi Minh.
Check the backlinks of the website.
Check the canonical URL tag.
Structural analysis of H1-H6.
Crawl images from website.
Analyze keyword density.
Check Title & Description length.
Check meta redirects.
Create SEO standard meta tags.
Generate OpenGraph image from URL.
Preview meta tags when sharing.
Check the URL redirect string.
Check noindex/nofollow.