gpt4 book ai didi

JavaScript Node.js WebScraping : How do I find specific elements on webpage table to scrape and push into an array of objects?

转载 作者:行者123 更新时间:2023-11-30 19:19:29 25 4
gpt4 key购买 nike

我正在尝试使用 UFC 比赛的博彩网站练习网络抓取。我正在使用 javascript 和包 request-promise 和 cheerio。

网站:https://www.oddsshark.com/ufc/odds

我想为每个博彩公司抓取战士的名字和他们各自的投注线。

sample screen shot of website

我的目标是最终得到类似于对象数组的东西,我以后可以用它来为 postgresql 数据库做种。

我想要的输出示例(不必完全一样但相似):

[
{ fighter 1: 'Khabib Nurmagomedov', openingBetLine: -333, bovadaBetLine: -365, etc. },
{ fighter 2: 'Dustin Poirier', openingBetLine: 225, bovadaBetLine: 275, etc. },
{ fighter 3: etc.},
{ fighter 4: etc.}
]

下面是我目前的代码。我对此一窍不通:

const rp = require("request-promise");
const url = "https://www.oddsshark.com/ufc/odds";


// cheerio to parse HTML
const $ = require("cheerio");

rp(url)
.then(function(html) {
// it worked :)

// console.log("MMA page:", html);
// console.log($("big > a", html).length);
// console.log($("big > a", html));

console.log($(".op-matchup-team-text", html).length);
console.log($(".op-matchup-team-text", html));
})
// why isn't catch working?
.catch(function(error) {
// handle error
});

我上面的代码返回索引作为键,嵌套对象作为值。下面仅举其中之一作为示例。

{ '0':
{ type: 'tag',
name: 'span',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object: null prototype] { class: 'op-matchup-team-text' },
'x-attribsNamespace': [Object: null prototype] { class: undefined },
'x-attribsPrefix': [Object: null prototype] { class: undefined },
children: [ [Object] ],
parent:
{ type: 'tag',
name: 'div',
namespace: 'http://www.w3.org/1999/xhtml',
attribs: [Object],
'x-attribsNamespace': [Object],
'x-attribsPrefix': [Object],
children: [Array],
parent: [Object],
prev: [Object],
next: [Object] },
prev: null,
next: null },

我不知道从这里开始做什么。我是否调用了正确的类(op-matchup-team-text)?如果是这样,我如何从网站中提取战斗机名称和投注线标签元素?

//////////////////////////////////////////////////////////////////////在原始帖子上更新 1/////////////////////////

更新:使用 Henk 的建议,我能够抓取战士的名字。使用战斗机名称的代码模板,我也能够抓取战斗机投注线。

但我不知道如何在同一个对象上同时获取这两个对象。例如,我如何将投注线与他/她本人相关联?

下面是我用来抓取 OPENING 公司投注线的代码:

rp(url)
.then(function(html) {
const $ = cheerio.load(html);

const openingBettingLine = [];

// parent class of fighter name
$("div.op-item.op-spread.op-opening").each((index, currentDiv) => {
const openingBet = {
opening: JSON.parse(currentDiv.attribs["data-op-moneyline"]).fullgame
};
openingBettingLine.push(openingBet);
});
console.log("openingBettingLine array test 2:", openingBettingLine);
})
// why isn't catch working?
// eslint-disable-next-line handle-callback-err
.catch(function(error) {
// handle error
});

控制台输出以下内容:

openingBettingLine array test 2: [ { opening: '-200' },
{ opening: '+170' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '' },
{ opening: '+105' },
{ opening: '-135' },
{ opening: '-165' },
{ opening: '+135' },
{ opening: '-120' },
{ opening: '-110' },
{ opening: '-135' },
{ opening: '+105' },
{ opening: '-165' },
{ opening: '+135' },
{ opening: '-115' },
{ opening: '-115' },
{ opening: '-145' },
{ opening: '+115' },
{ opening: '+208' },
{ opening: '-263' },
etc.

我想要的对象输出仍然是(如下例)。那么我如何将 openingBettingLine 放入与战斗机关联的对象中呢?

[
{ fighter 1: 'Khabib Nurmagomedov', openingBetLine: -333, bovadaBetLine: -365, etc. },
{ fighter 2: 'Dustin Poirier', openingBettingLine: 225, bovadaBetLine: 275, etc. },
{ fighter 3: etc.},
{ fighter 4: etc.}
]

//////////////////////////////////////////////////////////////////////在原始帖子上更新 2/////////////////////////

我不能让 BOVADA 公司的赌注赔钱。我将代码隔离到下面的这家公司。

//BOVADA 投注线数组 --> 不工作

rp(url)
.then(function(html) {
const $ = cheerio.load(html);

const bovadaBettingLine = [];

// parent class of fighter name
$("div.op-item.op-spread.border-bottom.op-bovada.lv").each(
(index, currentDiv) => {
const bovadaBet = {
BOVADA: JSON.parse(currentDiv.attribs["data-op-moneyline"]).fullgame
};
bovadaBettingLine.push(bovadaBet);
}
);
console.log("bovadaBettingLine:", bovadaBettingLine);
})
// why isn't catch working?
// eslint-disable-next-line handle-callback-err
.catch(function(error) {
// handle error
});

它返回:bovadaBettingLine: [] 没有任何内容。

以下是网站该部分的 HTML 代码。

enter image description here

最佳答案

短:

  1. 使用合适的 cheerio 方法选择正确的数据
  2. 创建您自己的对象,并将您的数据放入其中

详细信息:

首先分析你想要的数据的源代码:

<div class="op-matchup-team op-matchup-text op-team-top" data-op-name="{full_name:Jessica Andrade,short_name:}"><span class="op-matchup-team-text">Jessica Andrade</span></div>

您正在尝试获取战斗机的名称。所以你可以瞄准 <span class="op-matchup-team-text">Jessica Andrade</span> 的内容或 parent 的属性div这是 data-op-name="{full_name:Jessica Andrade,short_name:}"

让我们试试第二个:

  1. 获取所有divs具有所需的内容:$("div.op-matchup-team.op-matchup-text.op-team-top")
  2. 使用内置的 cheerios 遍历 div each()迭代器
  3. 在每次迭代中创建一个包含所有相关战斗机参数的对象并将它们推送到 fighters 中数组。

另请参阅下面的代码注释:

const rp = require("request-promise");
const url = "https://www.oddsshark.com/ufc/odds";
const cheerio = require("cheerio")
rp(url)
.then(function (html) {

const $ = cheerio.load(html)


const fighters = [];
$("div.op-matchup-team.op-matchup-text.op-team-top")
.each((index, currentDiv) => {
const fighter = {
name: JSON.parse(currentDiv.attribs["data-op-name"]).full_name,
//There is no direct selector for the rows of the second column based on the first one.
//So you need to select all rows of the second column as you did, and then use the current index
//to get the right row. Put the selected data into your "basket" the fighter object. Done.
openingBetLine: JSON.parse($("div.op-item.op-spread.op-opening")[index].attribs["data-op-moneyline"]).fullgame
// go on the same way with the other rows that you need.
}

fighters.push(fighter)
})

console.log(fighters)


}).catch(function (error) {
//error catch does work, you just need to print it out to see it
console.log(error)
});

会给你:

[{ name: 'Jessica Andrade',
openingBetLine: '-200'},...]

关于JavaScript Node.js WebScraping : How do I find specific elements on webpage table to scrape and push into an array of objects?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57641803/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com