robots.txt file is an important and sensitive file for your website. It is also important to understand and customize it properly. the ranking, indexing, speed, and privacy of your website depends on it.
History of robots.txt
Search engines like Google, Yahoo, and Bing use a similar program. Which goes to the internet and collects the necessary information. And keeps going from one website to another. Such web programs are called web crawlers, spiders, bots, or robots. When the service of the Internet began, at that time both computing power and memory were quite costly. Some website owners got very upset with the web crawler as it used to slow down the website. These robots or crawlers used to visit the website frequently. Due to which their servers could not show the website to real human visitors. And the resources of the website were also over. To deal with this problem, some people gave the idea of robots.txt.
This will instruct crawlers on which part of the website the owner is allowing to visit and which not.
What is the robots.txt File?
The robots.txt file is a text file. Which is in the root folder of any website. We understand this by the example of a domain.
Domain – https://www.yourdomain.com
Robots File – https://www.yourdomain.com/robots.txt
Whenever a search engine visits a domain. So first he searches the robots.txt file. If he finds the robots file, then he read it and follows its instructions. If he does not find the robot file, then he visits the entire website and indexes all the pages. But in the real world, data aggregators, Email collectors, Hackers Search Bots, Do not follow the instruction.
So some things get cleared here.
These are text file
Always in the root folder
Always named “robots.txt”
Search bots are not bound to follow it.
You can check the robots file of any website, just you have to write robots.txt in front of the name of that website with forward-slash. So here it is also clear that whether or not to obey the instruction of the robots.txt file, depends on the bots. Some big search engine companies such as Google, Yahoo Bing, or Yandex follow its Instruction. But small search engines and data aggregators, email collectors, and Hackers Search Bots do not follow the instructions very much.
Now let’s see what happens in this file from the example: –
If you want to let all search engine bots visit your website So this code is very much for you. First Line User-agent: * means instruction for all types of search engines. And here star (*) means wildcard.
Second Line Disallow: means that no part of the website is Disallow or blocked for search engine bots.
Disallow: / This means that you are giving instructions to search engine bots whether all the files in the root directory are Disallow or blocked. In any domain, we can make a link to a page only after forward-slash (/) is applied. Even the file or index file of any website also shows only after the forward-slash. So if you put Forward slash after Disallow. So you are blocking or Disallow all the pages of your website for search engine bots.
If you want to block or Disallow a particular search engine. So you can replace the star (*) with the name of that search engine. All search engine bots have different names. You can check the names of the bots by visiting this website – https://www.robotstxt.org/db.html
Google allows a crawl budget for every website. Which decides how many times bots have to visit your website. This crawl budget depends on two things.
No.1 Your server is not getting slow while crawling.
No.2 How popular is your website.
Google visits quickly on more content and more popular websites. In this, you can block the unimportant pages of the website. like login page, internal use document page, admin section, etc.
If you want to Disallow a folder or a specific page Then you have to give the name of the page or folder after the disallow semicolon forward slash. You can also block yours under the maintenance page. If there is a part in your website that is only for your employee or you do not want to show publicly So you can block that part also. This will improve both the page experience and performance of your website.
If there is too much traffic on your website, then you can also apply a crawler delay timer. When the crawler visits your web page from another web page, it will wait for the first few times. This will not slow down the speed of your website for the user. This weight time is in milliseconds.Crawl-10.Here 10 means that the crawler will stop 10 milliseconds after crawling a page and then go to the next page. This will give your server time to breathe and the server will not slow down.
You can also add a link to the sitemap in the robots file. This will allow the crawler to get links to all your pages simultaneously. And the indexing of your website will be fast.
As many popular websites as you can check their robots file. For more details check out the What Is robots.txt & How Does This Work For Website Rank In Search Engine video. This video is in Hindi language.
What Is robots.txt & How Does This Work For Website Video
Comment And Question ##Ideaz Zone
Blogs ➤ https://cutt.ly/LwGImO4
Facebook ➤ https://cutt.ly/fwGIbti
Linkedin ➤ https://cutt.ly/TwGpC3Y
Pinterest ➤ https://cutt.ly/qwGpVlV
youtube ➤ https://cutt.ly/ewGIh1s
Instagram ➤ https://cutt.ly/ZwGIx2l
Twitter ➤ https://cutt.ly/TwGpXZs