php - 我可以使用 Ruby On Rails 抓取 URL 的源代码，还是应该使用 PHP？-6ren

php - 我可以使用 Ruby On Rails 抓取 URL 的源代码，还是应该使用 PHP？

转载作者：行者123 更新时间：2023-11-28 12:41:16

Possible Duplicate:
How do I write a web scraper in Ruby?

我需要抓取我的应用程序数据库中列出的许多网站的源代码。我正在检查他们是否链接回我的网站。

是否可以使用 Ruby on Rails，还是应该使用 PHP？

最佳答案

您可以获取网站列表，然后对每个网站运行 curl。

编辑:或者，你可以尝试这个很棒的lib，简单的dom解析器(http://simplehtmldom.sourceforge.net):

<?php

require 'simple_html_dom.php';

define(MYWEBSITE, "google.com");
$html = file_get_html('http://www.google.com/');

foreach($html->find('a') as $link) {
  $url =  $link->href;
  if (!strpos($url, MYWEBSITE)) {
    // Do whatever you need to do here, we'll just simply echo out
    // the website URL that has your site URL in it.
    echo $url . " contains " . MYWEBSITE ."\n";
  }
}

?>

只是一个简单的技巧，但它可以完成工作。

关于php - 我可以使用 Ruby On Rails 抓取 URL 的源代码，还是应该使用 PHP？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12047944/