asp.net - 子目录中的 robots.txt-6ren

asp.net - 子目录中的 robots.txt

转载作者：塔克拉玛干更新时间：2023-11-03 02:27:07

28

4

我有一个位于主域下的文件夹中的项目，但我无权访问域本身的根目录。

http://mydomain.com/myproject/

我想禁止对子文件夹“forbidden”建立索引

http://mydomain.com/myproject/forbidden/

我可以简单地在 myproject 文件夹中放一个 robots.txt 吗？即使根目录中没有 robots.txt，它也会被读取吗？

禁止使用禁止文件夹的正确语法是什么？

User-agent: *
Disallow: /forbidden/

或

User-agent: *
Disallow: forbidden/

最佳答案

来自 robotstxt.org :

Where to put it

The short answer: in the top-level directory of your web server.

The longer answer:

When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

所以恐怕答案是您必须将它放在根文件夹中:-(

关于您的第二个问题，我认为正确的语法是以正斜杠开头的语法(例如 /forbidden/)。

关于asp.net - 子目录中的 robots.txt，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/4837334/

28

4

0

文章推荐： seo - 动态机器人.txt

文章推荐： seo - 什么是不狡猾或废话的常识性 SEO 做法？

文章推荐： seo - 加载谷歌字体的最佳方式 , @import 或 javascript

javafx.scene.robot.Robot 与 java.awt.Robot
我刚刚读到 JavaFX 有自己的 javafx.scene.robot.Robot类(class)。它与 java.awt.Robot 有何不同？？我也不太明白为什么 Robot 类被打包在 ja
robotics - 势场法 : Real Robots
势场法是一种非常流行的机器人导航模拟。然而，有没有人在真正的机器人上实现过势场法？在真实机器人中使用该方法的任何引用或任何声明？最佳答案我之前做过基于潜在场的路径规划，但放弃了它，转而采用更合适的
robots.txt - Robots.txt允许子文件夹但不允许父文件夹
任何人都可以在以下情况下解释正确的robots.txt命令。我想允许访问: /directory/subdirectory/.. 但是我也想限制访问/directory/，尽管有上述异常(excep
robots.txt - robots.txt 是否适用于子域？
假设我有一个测试文件夹 (test.domain.com) 并且我不希望搜索引擎在其中抓取，我是否需要在测试文件夹中有一个 robots.txt 或者我可以只放置一个 robots.txt在根目录中，
robots.txt - 如何禁止所有动态网址 robots.txt
关闭。这个问题是off-topic .它目前不接受答案。想改善这个问题吗？ Update the question所以它是 on-topic对于堆栈溢出。 9年前关闭。 Improve this q
robots.txt - robots.txt 中的顺序重要吗？
这个问题在这里已经有了答案: order of directives in robots.txt, do they overwrite each other or complement each ot
robots.txt - robots.txt 的伦理
关闭。这个问题是opinion-based .它目前不接受答案。想改进这个问题？更新问题，以便 editing this post 可以用事实和引用来回答它. 8年前关闭。 Improve this
robot - 微软机器人: cheap but very extensible robot?
关闭。这个问题是opinion-based 。目前不接受答案。想要改进这个问题吗？更新问题，以便 editing this post 可以用事实和引文来回答它。 . 已关闭 7 年前。 Improv
robotics-studio - Microsoft Robotics Studio和绝对路径问题
我刚刚安装了 Microsoft Robotics Studio 2008 R2，我必须承认我很震惊地发现路径的处理方式。第一个工作室想要将自己安装到我的个人资料中(这是在 Vista 上): C:
robots.txt - robots.txt 中只允许目录中的一个文件吗？
我只想允许目录 /minsc 中的一个文件，但我想禁止该目录的其余部分。现在 robots.txt 中是这样的: User-agent: * Crawl-delay: 10 # Directorie
robots.txt - 请求机器人重新解析 robots.txt
我正在编写一个将 youtube.com 映射到另一个域的代理服务器(因此用户可以轻松地从德国等国家/地区访问 youtube，而无需审查搜索结果和视频)。不幸的是，我的 robots.txt 中存
robots.txt - robots.txt 中的 Noindex
我一直使用 robots.txt 文件阻止谷歌将我的网站编入索引。最近我读了一篇来自谷歌员工的文章，他说你应该使用元标记来做到这一点。这是否意味着 Robots.txt 不起作用？由于我使用的是 CM
robots.txt - 我可以将 `robots.txt` 文件用于我学校域中的子目录吗？
我拥有一些在大学注册的网站空间。不幸的是，谷歌在该网站上找到了我的简历(简历)，但错误地将其索引为学术出版物，这在谷歌学术上搞砸了引用计数之类的事情。我尝试上传 robots.txt进入我的本地子目录
robots.txt - 使用 robots.txt 在子域上禁止或无索引
我在不同的子域上托管了 dev.example.com 和 www.example.com。我希望爬虫删除 dev 的所有记录子域，但将它们保留在 www .我使用 git 来存储两者的代码，所以理想
robots.txt - robots.txt:禁止漫游器访问给定的“网址深度”
我有此结构的链接： http://www.example.com/tags/等等 http://www.example.com/tags/ blubb http://www.example.com/t
robots.txt - Google+ robots.txt 中的这些行是什么意思？
http://plus.google.com/robots.txt具有以下内容: User-agent: * Disallow: /_/ 我假设这意味着搜索引擎可以索引根以外的第一层中的任何内容，而不
robots.txt - 如何配置 robots.txt 以允许一切？
Google 网站站长工具中的我的 robots.txt 显示以下值: User-agent: * Allow: / 这是什么意思？我对此了解不够，所以寻求您的帮助。我想允许所有机器人抓取我的网站，这
robots.txt - Googlebot 不尊重 Robots.txt
已关闭。这个问题是 off-topic 。目前不接受答案。想要改进这个问题吗？ Update the question所以它是on-topic用于堆栈溢出。已关闭11 年前。 Improve th
robots.txt - Robot.txt 允许在域中指定 URL
我只想允许主要 URL(域)和 http://domain/about，其他 URL 对搜索 google 不可见。示例我有如下链接: http://example.com http://exampl
robots.txt - 如何禁止 robots.txt 中的多个文件夹
我想禁止机器人抓取任何文件夹/子文件夹。我想禁止 ff: http://example.com/staging/ http://example.com/test/ 这是我的 robots.txt 中

首页

博学

6Ren·AI

商城

asp.net - 子目录中的 robots.txt