How to Use Robots.txt File to Gather Intelligence for Penetration Testing

How to Use Robots.txt File to Gather Intelligence for Penetration Testing
April 27, 2021 No Comments Cyber defense, cybersecurity, Online security, Penetration testing, Pentest, System security admin

In the head section of web documents, there is meta-information used to describe the page, including helping search engines categorize the page. The meta-information that is of utmost importance to the discussion is the meta information for robots that refers to the robots.txt file.

What is the robot.txt file?

The roborts.txt is a file that website owners use to inform web crawlers about their website, including information on what page to crawl and what page to ignore. According to Google, a good objective for a robots.txt file is to limit the number of requests made by robots to a website and reduce the server load. The importance of the robots.txt file to a pen tester is that the file is capable of providing information that can be used to identify vulnerabilities in the webserver. When such vulnerabilities are identified, the website owner can use the information to repair or patch up the vulnerability.

The robot.txt file may also leak information that may make a malicious hacker’s job easy. It is advised that you do not use the robot.txt file to hide information that should not be publicly available. If you specify that a page should not be crawled, do not include a reference link to that page on another page. This is because the web crawler will still discover the hidden page through the referenced link.
Search engines use a web crawler to crawl websites, and a typical web crawler will crawl a website according to the information contained in the robots.txt file. Malicious malware and unethical search engines may ignore any instruction on a robots.txt file and crawl all files that it can identify on a web server.


Leave a reply

Your email address will not be published. Required fields are marked *