- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在尝试使用 UFC 比赛的博彩网站练习网络抓取。我正在使用 javascript 和包 request-promise 和 cheerio。
网站:https://www.oddsshark.com/ufc/odds
我想为每个博彩公司抓取战士的名字和他们各自的投注线。
我的目标是最终得到类似于对象数组的东西,我以后可以用它来为 postgresql 数据库做种。
我想要的输出示例(不必完全一样但相似):
[
{ fighter 1: 'Khabib Nurmagomedov', openingBetLine: -333, bovadaBetLine: -365, etc. },
{ fighter 2: 'Dustin Poirier', openingBetLine: 225, bovadaBetLine: 275, etc. },
{ fighter 3: etc.},
{ fighter 4: etc.}
]
下面是我目前的代码。我对此一窍不通:
const rp = require("request-promise");
const url = "https://www.oddsshark.com/ufc/odds";
// cheerio to parse HTML
const $ = require("cheerio");
rp(url)
.then(function(html) {
// it worked :)
// console.log("MMA page:", html);
// console.log($("big > a", html).length);
// console.log($("big > a", html));
console.log($(".op-matchup-team-text", html).length);
console.log($(".op-matchup-team-text", html));
})
// why isn't catch working?
.catch(function(error) {
// handle error
});
我上面的代码返回索引作为键,嵌套对象作为值。下面仅举其中之一作为示例。
{ '0':
{ type: 'tag',
name: 'span',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object: null prototype] { class: 'op-matchup-team-text' },
'x-attribsNamespace': [Object: null prototype] { class: undefined },
'x-attribsPrefix': [Object: null prototype] { class: undefined },
children: [ [Object] ],
parent:
{ type: 'tag',
name: 'div',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object],
'x-attribsNamespace': [Object],
'x-attribsPrefix': [Object],
children: [Array],
parent: [Object],
prev: [Object],
next: [Object] },
prev: null,
next: null },
我不知道从这里开始做什么。我是否调用了正确的类(op-matchup-team-text)?如果是这样,我如何从网站中提取战斗机名称和投注线标签元素?
//////////////////////////////////////////////////////////////////////在原始帖子上更新 1/////////////////////////
更新:使用 Henk 的建议,我能够抓取战士的名字。使用战斗机名称的代码模板,我也能够抓取战斗机投注线。
但我不知道如何在同一个对象上同时获取这两个对象。例如,我如何将投注线与他/她本人相关联?
下面是我用来抓取 OPENING 公司投注线的代码:
rp(url)
.then(function(html) {
const $ = cheerio.load(html);
const openingBettingLine = [];
// parent class of fighter name
$("div.op-item.op-spread.op-opening").each((index, currentDiv) => {
const openingBet = {
opening: JSON.parse(currentDiv.attribs["data-op-moneyline"]).fullgame
};
openingBettingLine.push(openingBet);
});
console.log("openingBettingLine array test 2:", openingBettingLine);
})
// why isn't catch working?
// eslint-disable-next-line handle-callback-err
.catch(function(error) {
// handle error
});
控制台输出以下内容:
openingBettingLine array test 2: [ { opening: '-200' },
{ opening: '+170' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '+105' },
{ opening: '-135' },
{ opening: '-165' },
{ opening: '+135' },
{ opening: '-120' },
{ opening: '-110' },
{ opening: '-135' },
{ opening: '+105' },
{ opening: '-165' },
{ opening: '+135' },
{ opening: '-115' },
{ opening: '-115' },
{ opening: '-145' },
{ opening: '+115' },
{ opening: '+208' },
{ opening: '-263' },
etc.
我想要的对象输出仍然是(如下例)。那么我如何将 openingBettingLine 放入与战斗机关联的对象中呢?
[
{ fighter 1: 'Khabib Nurmagomedov', openingBetLine: -333, bovadaBetLine: -365, etc. },
{ fighter 2: 'Dustin Poirier', openingBettingLine: 225, bovadaBetLine: 275, etc. },
{ fighter 3: etc.},
{ fighter 4: etc.}
]
//////////////////////////////////////////////////////////////////////在原始帖子上更新 2/////////////////////////
我不能让 BOVADA 公司的赌注赔钱。我将代码隔离到下面的这家公司。
//BOVADA 投注线数组 --> 不工作
rp(url)
.then(function(html) {
const $ = cheerio.load(html);
const bovadaBettingLine = [];
// parent class of fighter name
$("div.op-item.op-spread.border-bottom.op-bovada.lv").each(
(index, currentDiv) => {
const bovadaBet = {
BOVADA: JSON.parse(currentDiv.attribs["data-op-moneyline"]).fullgame
};
bovadaBettingLine.push(bovadaBet);
}
);
console.log("bovadaBettingLine:", bovadaBettingLine);
})
// why isn't catch working?
// eslint-disable-next-line handle-callback-err
.catch(function(error) {
// handle error
});
它返回:bovadaBettingLine: []
没有任何内容。
以下是网站该部分的 HTML 代码。
最佳答案
短:
详细信息:
首先分析你想要的数据的源代码:
<div class="op-matchup-team op-matchup-text op-team-top" data-op-name="{full_name:Jessica Andrade,short_name:}"><span class="op-matchup-team-text">Jessica Andrade</span></div>
您正在尝试获取战斗机的名称。所以你可以瞄准 <span class="op-matchup-team-text">Jessica Andrade</span>
的内容或 parent 的属性div
这是 data-op-name="{full_name:Jessica Andrade,short_name:}"
让我们试试第二个:
divs
具有所需的内容:$("div.op-matchup-team.op-matchup-text.op-team-top")
each()
迭代器fighters
中数组。另请参阅下面的代码注释:
const rp = require("request-promise");
const url = "https://www.oddsshark.com/ufc/odds";
const cheerio = require("cheerio")
rp(url)
.then(function (html) {
const $ = cheerio.load(html)
const fighters = [];
$("div.op-matchup-team.op-matchup-text.op-team-top")
.each((index, currentDiv) => {
const fighter = {
name: JSON.parse(currentDiv.attribs["data-op-name"]).full_name,
//There is no direct selector for the rows of the second column based on the first one.
//So you need to select all rows of the second column as you did, and then use the current index
//to get the right row. Put the selected data into your "basket" the fighter object. Done.
openingBetLine: JSON.parse($("div.op-item.op-spread.op-opening")[index].attribs["data-op-moneyline"]).fullgame
// go on the same way with the other rows that you need.
}
fighters.push(fighter)
})
console.log(fighters)
}).catch(function (error) {
//error catch does work, you just need to print it out to see it
console.log(error)
});
会给你:
[{ name: 'Jessica Andrade',
openingBetLine: '-200'},...]
关于JavaScript Node.js WebScraping : How do I find specific elements on webpage table to scrape and push into an array of objects?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57641803/
我是一名优秀的程序员,十分优秀!