gpt4 book ai didi

c# - 无法使用 HtmlAgilityPack C# ASP.NET 在
    中获取子类别

转载 作者:太空宇宙 更新时间:2023-11-03 23:19:33 24 4
gpt4 key购买 nike

我是 Webscraping 的新手,正在尝试使用 ASP.NET C# 从带有 HTMLAgilityPack 的网站获取数据。我试图解析的 HTML 结构是:

<li class='subsubnav' id='new-women-clothing'>
<span class='cat-name'>CLOTHING</span>

<ul>
<li><a href="/womenswear/womens-just-in" id="just-in">Just In</a></li>

<li><a href="/womenswear/new-season-exclusives" id="exclusives">Exclusives</a></li>

<li><a href="/womenswear/new-season-dresses" id="dresses-&-gowns">Dresses & Gowns</a></li>

<li><a href="/womenswear/new-season-coats" id="coats">Coats</a></li>

<li><a href="/womenswear/new-season-jackets" id="jackets">Jackets</a></li>

<li><a href="/womenswear/new-season-shirts-and-blouses" id="shirts-&-blouses">Shirts & Blouses</a></li>

<li><a href="/womenswear/new-season-tops" id="tops">Tops</a></li>

<li><a href="/womenswear/new-season-knitwear" id="knitwear">Knitwear</a></li>

<li><a href="/womenswear/new-season-sweatshirts" id="sweatshirts">Sweatshirts</a></li>

<li><a href="/womenswear/new-season-skirts-and-shorts" id="skirts-&-shorts">Skirts & Shorts</a></li>

<li><a href="/womenswear/new-season-trousers" id="trousers">Trousers</a></li>

<li><a href="/womenswear/new-season-jumpsuits" id="jumpsuits">Jumpsuits</a></li>

<li><a href="/womenswear/new-season-jeans" id="jeans">Jeans</a></li>

<li><a href="/womenswear/new-season-swimwear" id="swimwear">Swimwear</a></li>

<li><a href="/womenswear/new-season-lingerie" id="lingerie">Lingerie</a></li>

<li><a href="/womenswear/new-season-nightwear" id="nightwear">Nightwear</a></li>

<li><a href="/womenswear/sportswear" id="sportswear">Sportswear</a></li>

<li><a href="/womenswear/ski-wear" id="ski-wear">Ski Wear</a></li>

</ul>

</li>

我正在获取父类别,在本例中是 CLOTHING,但我无法获取 ul 中的元素。

这是我的 C# 代码:

var html = new HtmlDocument();
html.LoadHtml(new WebClient().DownloadString("http://www.harrods.com/men/t-shirts?icid=megamenu_MW_clothing_t_shirts"));
var root = html.DocumentNode;
var nodes = root.Descendants();
var totalNodes = nodes.Count();
var dt = root.Descendants().Where(n => n.GetAttributeValue("class", "").Equals("cat-name"));

foreach(var x in dt)
{
foreach (var element in x.Descendants("ul"))
{
child_data.Add(new cat_childs(element.InnerText));
}

data.Add(new Categories(x.InnerText,child_data));
}

test.DataSource = data;
test.DataBind();

那么如何获取<ul>里面 anchor 标签的链接和文字呢? ?

最佳答案

如果您想将迭代基于 spanclass='cat-name',则目标 ulspanfollowing sibling 而不是 descendant。您可以使用 SelectNodes() 从当前 span 获取后续同级元素,如下所示:

foreach (var x in dt)
{
foreach (var element in x.SelectNodes("following-sibling::ul/li/a"))
{
child_data.Add(new cat_childs(element.InnerText));
}

data.Add(new Categories(x.InnerText,child_data));
}

更新:

似乎 实际问题出在外循环外声明的 child_data 变量中。这意味着您一直在向同一个 child_data 实例添加项目。尝试在外循环内声明它,就在 foreach (var x in dt){ 之后。或者,您可以将整个代码编写为 LINQ 表达式,如下所示:

var data = (from d in dt
let child_data = x.SelectNodes("following-sibling::ul/li/a")
.Select(o => new cat_childs(o.InnerText))
.ToList()
select new Categories(x.InnerText, child_data)
).ToList();

关于c# - 无法使用 HtmlAgilityPack C# ASP.NET 在 <ul> 中获取子类别,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35885533/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com