gpt4 book ai didi

r - 如何在 R 中从 XML 中取消嵌套字典?

转载 作者:行者123 更新时间:2023-12-03 07:54:56 24 4
gpt4 key购买 nike

我正在尝试将此 xml 转换为 R 中的数据框:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

library(xml2)
library(tidyverse)

fileurl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
xmllist <- as_list(read_xml(fileurl))
xml_df = tibble::as_tibble(xmllist) %>%
unnest_longer(response)

row_wider = xml_df %>%
unnest_wider(response)

row_df = row_wider %>%
unnest(cols = names(.)) %>%
unnest(cols = names(.)) %>%
readr::type_convert()

问题是“location_1”列是一个字典,当我取消嵌套时显示为 NA。如何将此字典的每个值放入此列中?非常感谢任何帮助,谢谢。

最佳答案

请求的地址数据以 JSON 形式存储在 XML 节点的属性中。
下面我提取属性,转换 JSON,然后合并。然后可以将生成的数据帧绑定(bind)到上面执行的工作。

详情请参阅评论。

library(xml2)
library(jsonlite)
library(tidyverse)

#read file as xml
page <- read_xml(fileurl)

#extract out the restaurant nodes into a vector
restaurants <- page %>% xml_find_all(".//row/row")
#get the address data which is stored as attribute data
addresses <- restaurants %>% xml_find_first(".//location_1") %>% xml_attr("human_address")

#this is a vector of JSON data structures
#convert the JSON to a data frame

dfs <- lapply(addresses, function(address){
address %>% fromJSON() %>% as.data.frame()
})
#combine all of the data frames
answer<- bind_rows(dfs)

answer

address city state zip
1 4509 BELAIR ROAD Baltimore MD
2 1919 FLEET ST Baltimore MD
3 2844 HUDSON ST Baltimore MD
4 3998 ROLAND AVE Baltimore MD
5 2481 frederick ave Baltimore MD
6 2722 HARFORD RD Baltimore MD

关于r - 如何在 R 中从 XML 中取消嵌套字典?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76267444/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com