gpt4 book ai didi

java - JSOUP从同名的div中获取div内容

转载 作者:行者123 更新时间:2023-12-01 10:44:54 25 4
gpt4 key购买 nike

我希望解析具有相同类的 2 个 div 的网页。

以下是我尝试解析的网页部分:

<div class="bid-row rgray bmatch" id="m590574">
<div class="mtime">12:00</div>
<div class="mteams w240" data-original-title="" title="">
<div class="team">Rayo Vallecano</div>
<div class="team">Malaga CF</div>
</div>
<div class="modds w160">
<div class="clear">
<div class="blank"></div>
<input class="bet" id="q43909084" type="button" value="2.35">
<input class="bet" id="q43909085" type="button" value="3.30">
<input class="bet" id="q43909086" type="button" value="3.15">
</div>
</div>
<div class="minfo">
<div class="stats" data-brid="7610448_1"></div>
<div data-tvinfo="Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD" class="fleft tv"></div>
<div class="mlive"></div>
<div class="slider" data-mode="1" data-tid="36" data-cid="32">+50<span class="glyphicon glyphicon-chevron-right"></span></div>
</div>

我正在使用 JSOUP 来解析它,这是我的代码现在的样子:

     Elements hrefElements = doc.select("div.bmatch");
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

// root elements
org.w3c.dom.Document doc1 = docBuilder.newDocument();
org.w3c.dom.Element rootElement = doc1.createElement("company");

doc1.appendChild(rootElement);

String[] mtime = new String[hrefElements.size()];

String[] team = new String[hrefElements.size()];
String[] tvinfo = new String[hrefElements.size()];

for(int i=0;i<hrefElements.size();i++)
{
mtime[i] = hrefElements.get(i).getElementsByClass("mtime").text();
team[i] = hrefElements.get(i).getElementsByClass("team").text();
tvinfo[i] = hrefElements.get(i).getElementsByTag("div").attr("data-tvinfo");
}
for(int j=0;j<hrefElements.size();j++)
{
// staff elements
org.w3c.dom.Element staff = doc1.createElement("Event");
rootElement.appendChild(staff);

// set attribute to staff element
Attr attr = doc1.createAttribute("id");
attr.setValue("1");
staff.setAttributeNode(attr);
org.w3c.dom.Element firstname = doc1.createElement("Time");
firstname.appendChild(doc1.createTextNode(mtime[j]));
staff.appendChild(firstname);

// lastname elements
org.w3c.dom.Element lastname = doc1.createElement("Teams");
lastname.appendChild(doc1.createTextNode(team[j]));
staff.appendChild(lastname);





// nickname elements
org.w3c.dom.Element nickname = doc1.createElement("TV");
nickname.appendChild(doc1.createTextNode(tvinfo[j]));
staff.appendChild(nickname);


System.out.println("Time: "+mtime[j]);
System.out.println("Event: "+team[j]);
System.out.println("TvInfo: "+tvinfo[j]);
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc1);
String nameGame = jTextField3.getText();
StreamResult result = new StreamResult(new File("test.xml"));
//StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
// Output to console for testing
// StreamResult result = new StreamResult(System.out);

transformer.transform(source, result);

System.out.println("File saved!");

}

但是,我得到的 HTML 部分的输出如下:

 <Event id="1">
<Time>Today12:00</Time>
<Teams>Rayo Vallecano Malaga CF</Teams>
<TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV>
</Event>

我想要实现的最终 xml 应该如下所示:

        <Event id="1">
<Time>Today12:00</Time>
<Team1>Rayo Vallecano</Team1>
<Team2>Malaga CF</Team2>
<TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV>
</Event>

最佳答案

您已使用 hrefElements.get(i).getElementsByClass("team").text(); 获取团队名称,它返回所有 macthing 元素的附加文本。在本例中,Rayo Vallecano Malaga CF 代表团队 Rayo VallecanoMalaga CF

试试这个。

        Elements hrefElements = doc.select("div.bmatch");
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();

// root elements
org.w3c.dom.Document doc1 = docBuilder.newDocument();
org.w3c.dom.Element rootElement = doc1.createElement("company");
doc1.appendChild(rootElement);


for( int i = 0; i < hrefElements.size(); i++ )
{
// staff elements
org.w3c.dom.Element staff = doc1.createElement("Event");
rootElement.appendChild(staff);

// set attribute to staff element
Attr attr = doc1.createAttribute("id");
attr.setValue("" + (i + 1));
staff.setAttributeNode(attr);

Element timeSection = hrefElements.get(i).select("div.mtime").first(); // one time section
Element teamsSection = hrefElements.get(i).select("div.mteams").first(); // one team section
Element infoSection = hrefElements.get(i).select("div.minfo").first(); // one info section

String time = timeSection.text();
Elements teams = teamsSection.select("div.team"); // many teams within team section
String tvInfo = infoSection.select("div.tv").first().attr("data-tvinfo");

// time element
org.w3c.dom.Element timeElement = doc1.createElement("Time");
timeElement.appendChild(doc1.createTextNode(time));
staff.appendChild(timeElement);
System.out.println(timeElement.getTextContent());

// teams
for(int j = 0; j < teams.size(); j++) {
org.w3c.dom.Element teamElement = doc1.createElement("Team" + (j + 1));
teamElement.appendChild(doc1.createTextNode(teams.get(j).text()));
staff.appendChild(teamElement);
System.out.println(teamElement.getTextContent());
}

// tv info
org.w3c.dom.Element nickname = doc1.createElement("TV");
nickname.appendChild(doc1.createTextNode(tvInfo));
staff.appendChild(nickname);
System.out.println(nickname.getTextContent());
}

TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc1);

StreamResult result = new StreamResult(new File("test.xml"));
transformer.transform(source, result);

System.out.println("File saved!");

关于java - JSOUP从同名的div中获取div内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34246731/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com