gpt4 book ai didi

xml - R,使用 XML 和 xpathSApply 时的正确 xpath 表达式

转载 作者:数据小太阳 更新时间:2023-10-29 01:56:35 25 4
gpt4 key购买 nike

假设我使用以下表达式解析了一个网站

library(XML)
url.df_1 = htmlTreeParse("http://www.appannie.com/app/android/com.king.candycrushsaga/", useInternalNodes = T)

如果我运行下面的代码,

xpathSApply(url.df_1, "//div[@class='app_content_section']/h3", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))

我会在下面 -

[1] "Description"                      "What's new"                      
[3] "Permissions" "More Apps by King.com All Apps »"
[5] "Customers Also Viewed" "Customers Also Installed"

现在,我感兴趣的只是“Customers Also Installed”部分。但是当我运行下面的代码时,

xpathSApply(url.df_1, "//div[@class='app_content_section']/ul/li/a", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))

它吐出了“King.com All Apps 的更多应用程序”、“客户还查看了”和“客户还安装了”中包含的所有应用程序。

所以我试过了,

xpathSApply(url.df_1, "//div[h3='Customers Also Installed']”, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]]))

但这没有用。所以我尝试了

xpathSApply(url.df_1, "//div[contains(.,'Customers Also Installed')]",xmlValue)

但这也行不通。 (输出应该如下所示-)

 [,1]                                                
[1,] "Christmas Candy Free\n Daniel Development\n "
[2,] "/app/android/xmas.candy.free/"
[,2]
[1,] "Jewel Candy Maker\n Nutty Apps\n "
[2,] "/app/android/com.candy.maker.jewel.nuttyapps/"
[,3]
[1,] "Pogz 2\n Terry Paton\n "
[2,] "/app/android/com.terrypaton.unity.pogz2/"

任何指导将不胜感激!

最佳答案

这是一个选项(你真的很接近):

xpathSApply(url.df_1,"//div[contains(.,'Customers Also Installed')]/*/li/a",xmlGetAttr,'href')

[1] "/app/android/xmas.candy.free/"
[2] "/app/android/com.candy.maker.jewel.nuttyapps/"
[3] "/app/android/com.terrypaton.unity.pogz2/"

关于xml - R,使用 XML 和 xpathSApply 时的正确 xpath 表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15805607/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com