gpt4 book ai didi

css - Nokogiri 刮文本方法替代?

转载 作者:太空宇宙 更新时间:2023-11-03 17:53:10 26 4
gpt4 key购买 nike

我试图让 Nokogiri 在 ESPN 的网站上抓取林书豪上一场比赛的统计数据,但是,CSS 文本方法给我的是一个字符串,统计数据之间没有任何空格。

scraper.get_last_game_stats.text 返回的字符串是:

"Sat 11/16vsDENW 122-111326-11.5450-2.0004-6.66747113116Wed 11/13@ PHIL 117-1234910-19.5269-15.6005-6.833512005834Sat 11/9vsLACL 94-107263-7.4290-0.0000-0.0001701156"

我试图在每个统计数据之间放置空格,但是,即使我循环遍历主要对象,在迭代之间放置空格或破折号,我也无法拆分抢断、盖帽、得分、失误和其他所有数据:

class PlayerScraper
attr_accessor :player_data, :name

def initialize(url)
@player_data = Nokogiri::HTML(open(url))
end

def get_last_game_stats
@last_game_stats = @player_data.css('tr[class^="oddrow team-46"]')
end
end

jlin_url = "http://espn.go.com/nba/player/_/id/4299/jeremy-lin"

scraper = PlayerScraper.new(jlin_url)
scraper.get_last_game_stats.text

有人可以告诉我更好的方法吗?

最佳答案

您正在遍历行,而不是包含的单元格。您需要同时执行这两项操作才能以可用的形式获取单元格的值:

require 'open-uri'
require 'nokogiri'

URL = 'http://espn.go.com/nba/player/_/id/4299/jeremy-lin'
doc = Nokogiri::HTML(open(URL))

data = doc.css('tr[class^="oddrow team-46"]').map{ |tr|
tr.css('td').map(&:text)
}

data
# => [["Sat 11/16",
# "vsDEN",
# "W 122-111",
# "32",
# "6-11",
# ".545",
# "0-2",
# ".000",
# "4-6",
# ".667",
# "4",
# "7",
# "1",
# "1",
# "3",
# "1",
# "16"],
# ["Wed 11/13",
# "@ PHI",
# "L 117-123",
# "49",
# "10-19",
# ".526",
# "9-15",
# ".600",
# "5-6",
# ".833",
# "5",
# "12",
# "0",
# "0",
# "5",
# "8",
# "34"],
# ["Sat 11/9",
# "vsLAC",
# "L 94-107",
# "26",
# "3-7",
# ".429",
# "0-0",
# ".000",
# "0-0",
# ".000",
# "1",
# "7",
# "0",
# "1",
# "1",
# "5",
# "6"]]

以不同的方式查看数据,这会将其输出为行:

data.each do |row|
puts row.join(', ')
end
# >> Sat 11/16, vsDEN, W 122-111, 32, 6-11, .545, 0-2, .000, 4-6, .667, 4, 7, 1, 1, 3, 1, 16
# >> Wed 11/13, @ PHI, L 117-123, 49, 10-19, .526, 9-15, .600, 5-6, .833, 5, 12, 0, 0, 5, 8, 34
# >> Sat 11/9, vsLAC, L 94-107, 26, 3-7, .429, 0-0, .000, 0-0, .000, 1, 7, 0, 1, 1, 5, 6

表格非常简单,您可以使用两个嵌套循环创建表格。要稍后访问每个单元格,您需要执行相同的操作,循环遍历行,然后在该循​​环内遍历单元格。这就是我编写的所有代码。

另见“How to avoid joining all text from Nodes when scraping”。

关于css - Nokogiri 刮文本方法替代?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20061224/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com