如何使用robots.txt仅允许抓取工具访问index.php？

如果我只想允许爬虫访问index.php，这个工作吗？如何使用robots.txt仅允许抓取工具访问index.php？

User-agent: * 
Disallow:/
Allow: /index.php

2009-10-28 todd

我很好奇，你为什么会想这样做......你会不会想爬虫索引更为您的网站？ – 2009-10-28 14:33:22

您可以使用Google Robots tool进行结账。我永远不会在机器人文件中放置任何秘密目录，因为我猜想像下面这样的一行对于某些蜘蛛来说就像是亲爱的。

Disallow: /secret

2009-10-28 14:36:33 Janco

尝试更换的顺序不允许/允许：

User-agent: * 
Allow: /index.php 
Disallow:/

参见维基百科这样的信息：

“然而，在秩序，是所有机器人兼容，如果你想要允许在不允许的目录中存在单个文件，则需要先放置允许指令，然后再放置不允许，例如：”

不过我不希望它太一致

2009-10-28 14:38:00 UpTheCreek

是，它将携手。以下是Google Webmaster Tool的测试结果。

Url 
http://www.example.org/index.php 

Googlebot 
Allowed by line 3: Allow: /index.php 

Googlebot-Mobile 
Allowed by line 3: Allow: /index.php

但是，请记住，使用此配置，除非使用完全限定路径访问页面，否则您的网站主页将不会被抓取。换句话说，http://www.example.org/被禁止，而http://www.example.org/index.php被允许。

如果您希望您的主页可以访问，请提供更好的文件版本。

User-agent: * 
Disallow:/
Allow: /index.php 
Allow: /$

2009-10-30 11:44:33

你能解释为什么/ $工作，或者它做了什么？ – 2015-03-03 01:48:07

可以找到'/ $'的解释[here]（http://stackoverflow.com/a/29475539/1973409） – 2016-12-17 20:27:48

User-agent: * 

Allow: /index.php 
Disallow:/

2011-03-02 11:42:17 bulava

User-agent: * 
Allow: /$ 
Allow: /index.php 
Allow: /sitemap.xml 
Allow: /robots.txt 
Disallow:/

Sitemap: http://www.your-site-name.com/sitemap.xml

2014-08-04 00:48:52 mRGogo

你能解释一下你的答案吗？ – Qix 2014-08-04 01:12:36

回答