gpt4 book ai didi

r - 如何摆脱错误 : Tibble columns must have compatible sizes?

转载 作者:行者123 更新时间:2023-12-05 05:34:43 28 4
gpt4 key购买 nike

一位社区成员帮助我编写了以下代码:

library(rvest)
library(tidyverse)

get_articles <- function(n_articles) {
page <- paste0("https://www.theroot.com/news/criminal-justice",
"?startIndex=",
n_articles) %>%
read_html()

tibble(
title = page %>%
html_elements(".aoiLP .js_link") %>%
html_text2(),
author = page %>%
html_elements(".llHfhX .js_link , .permalink-bylineprop") %>%
html_text2(),
date = page %>%
html_elements(".js_meta-time") %>%
html_text2(),
url = page %>%
html_elements(".aoiLP .js_link") %>%
html_attr("href")
)
}

df <- map_dfr(seq(0, 200, by = 20), get_articles)

但是当我尝试运行它时,我收到以下错误:

! Tibble columns must have compatible sizes.• Size 20: Existing data.• Size 21: Column author.ℹ Only values of size one are recycled.

我已经在此处搜索了解决方案,但未能从中获得太多意义。如果有任何帮助,我将不胜感激。

最佳答案

由于代码中的 author 返回 url 中所有作者的列表,并且某些文章有多个作者,因此该函数返回的作者多于文章。 dataframetibble 的每一列中的元素数量必须相同。

例如,这会引发类似的错误

tibble::tibble(url = 1:3, author = 1:4)
#> Error: Tibble columns must have compatible sizes.
#> * Size 3: Existing data.
#> * Size 4: Column `author`.
#> i Only values of size one are recycled.

一种选择是在阅读每篇文章的内容时将作者姓名的检索推到下一步。请注意第 10 个 url 链接到没有文章正文的视频,因此它不返回任何 content

library(rvest)
library(tidyverse)


get_articles <- function(n_articles) {
page <- paste0("https://www.theroot.com/news/criminal-justice",
"?startIndex=",
n_articles) %>%
read_html()

tibble(
title = page %>%
html_elements(".aoiLP .js_link") %>%
html_text2(),
date = page %>%
html_elements(".js_meta-time") %>%
html_text2(),
url = page %>%
html_elements(".aoiLP .js_link") %>%
html_attr("href")
)
}

#df <- map_dfr(seq(0, 200, by = 20), get_articles)
df <- map_dfr(0, get_articles) #small example


df %>%
slice(1:10) %>% # subset 10 rows for example
mutate(html = map(url, read_html),
content = map(html, ~ .x %>%
html_elements(".bOfvBY") %>%
html_text2 %>%
paste(collapse = ",")),
author = map(html, ~ .x %>%
html_elements(".llHfhX .js_link , .permalink-bylineprop") %>%
html_text2() %>%
set_names(paste0('author', 1:length(.))) #name the elements, which will become column names
)
) %>%
unnest(content) %>%
unnest_wider(author)
#> # A tibble: 10 x 7
#> title date url html content author1 author2
#> <chr> <chr> <chr> <lis> <chr> <chr> <chr>
#> 1 "US Soldier S~ Today ~ https://www.t~ <xml~ "A US soldier ~ Kalyn W~ <NA>
#> 2 "South Caroli~ Yester~ https://www.t~ <xml~ "On Tuesday, a~ Jessica~ <NA>
#> 3 "Abortion is ~ Tuesda~ https://www.t~ <xml~ "Abortion is o~ Jessica~ <NA>
#> 4 "Pennsylvania~ 9/02/2~ https://www.t~ <xml~ "Pennsylvania ~ Kalyn W~ <NA>
#> 5 "UN Committee~ 9/02/2~ https://www.t~ <xml~ "The devolving~ Jessica~ <NA>
#> 6 "DA Fani Will~ 8/30/2~ https://www.t~ <xml~ "There continu~ Murjani~ <NA>
#> 7 "How to Prote~ 8/30/2~ https://www.t~ <xml~ "The decision ~ Jessica~ <NA>
#> 8 "26 Alleged G~ 8/29/2~ https://www.t~ <xml~ "Twenty-six pe~ Keith R~ <NA>
#> 9 "Judge Angere~ 8/29/2~ https://www.t~ <xml~ "Sullivan Walt~ Kalyn W~ <NA>
#> 10 "Small Town H~ 8/27/2~ https://www.t~ <xml~ "" Kalyn W~ Adriano~

reprex package 创建于 2022-09-08 (v2.0.0)

关于r - 如何摆脱错误 : Tibble columns must have compatible sizes?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73628499/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com