gpt4 book ai didi

r - Mutate 以错误的顺序返回数据 dplyr 这是一个错误吗?

转载 作者:行者123 更新时间:2023-12-04 12:08:05 25 4
gpt4 key购买 nike

我遇到了 dplyr 中的 mutate() 以错误的顺序返回结果的问题。我对 mutate 的调用使用现有列中的数据作为输入,但返回的结果排列就像数据在 mutate 之前排序一样。

我的特定问题使用 dataRetrieval 包从网络获取 USGS/NWIS 数据。在此示例中,我根据站点 ID 检索站点名称。在`dataRetreival 包中,站点 ID 是以字符形式存储的数字代码。

library(dataRetrieval)
library(dplyr)

Gauges <- tibble( Name = c("Twisp", "Chewuch", "Andrews" ,"Met@Winthrop", "Met@Twisp", "Met@Pateros", "Met@Goat"),
ID = c("12448998" , "12448000","12447390", "12448500" ,"12449500","12449950" , "12447383")
)

## This works correctly with each of the station numbers
readNWISsite(Gauges$ID[1])$station_nm
# [1] "TWISP RIVER NEAR TWISP, WA"

## This does not work correctly
## Order is not right! Station does not correspond with ID !!
Gauges%>%
mutate(Station = readNWISsite(ID)$station_nm)

# # A tibble: 7 x 3
# Name ID Station
# <chr> <chr> <chr>
# 1 Twisp 12448998 METHOW RIVER ABOVE GOAT CREEK NEAR MAZAMA, WA
# 2 Chewuch 12448000 ANDREWS CREEK NEAR MAZAMA, WA
# 3 Andrews 12447390 CHEWUCH RIVER AT WINTHROP, WA
# 4 Met@Winthrop 12448500 METHOW RIVER AT WINTHROP, WA
# 5 Met@Twisp 12449500 TWISP RIVER NEAR TWISP, WA
# 6 Met@Pateros 12449950 METHOW RIVER AT TWISP, WA
# 7 Met@Goat 12447383 METHOW RIVER NEAR PATEROS, WA

## This works, returning the correct site associated with the gauge number
Gauges%>%
arrange(ID) %>%
mutate(Station = readNWISsite(ID)$station_nm)
# # A tibble: 7 x 3
# Name ID Station
# <chr> <chr> <chr>
# 1 Met@Goat 12447383 METHOW RIVER ABOVE GOAT CREEK NEAR MAZAMA, WA
# 2 Andrews 12447390 ANDREWS CREEK NEAR MAZAMA, WA
# 3 Chewuch 12448000 CHEWUCH RIVER AT WINTHROP, WA
# 4 Met@Winthrop 12448500 METHOW RIVER AT WINTHROP, WA
# 5 Twisp 12448998 TWISP RIVER NEAR TWISP, WA
# 6 Met@Twisp 12449500 METHOW RIVER AT TWISP, WA
# 7 Met@Pateros 12449950 METHOW RIVER NEAR PATEROS, WA

为什么 mutate 在进程中间重新排列数据?或者,这里发生了什么?

最佳答案

要了解发生了什么,不要只提取“station_nm”,还要获取“site_no”

library(dplyr)
library(dataRetrieval)
readNWISsite(Gauges$ID)[c('site_no', 'station_nm')]
#site_no station_nm
#1 12447383 METHOW RIVER ABOVE GOAT CREEK NEAR MAZAMA, WA
#2 12447390 ANDREWS CREEK NEAR MAZAMA, WA
#3 12448000 CHEWUCH RIVER AT WINTHROP, WA
#4 12448500 METHOW RIVER AT WINTHROP, WA
#5 12448998 TWISP RIVER NEAR TWISP, WA
#6 12449500 METHOW RIVER AT TWISP, WA
#7 12449950 METHOW RIVER NEAR PATEROS, WA

此处,“site_no”根据“ID”的整数值排序。要更正此问题,我们可以使用 rowwise

一次在每个“ID”上应用该函数
Gauges %>% 
rowwise() %>%
mutate(Station = readNWISsite(ID)$station_nm)

map 来自 purrr

library(purrr)
Gauges %>%
mutate(Station = map_chr(ID, ~ readNWISsite(.x)$station_nm))
# A tibble: 7 x 3
# Name ID Station
# <chr> <chr> <chr>
#1 Twisp 12448998 TWISP RIVER NEAR TWISP, WA
#2 Chewuch 12448000 CHEWUCH RIVER AT WINTHROP, WA
#3 Andrews 12447390 ANDREWS CREEK NEAR MAZAMA, WA
#4 Met@Winthrop 12448500 METHOW RIVER AT WINTHROP, WA
#5 Met@Twisp 12449500 METHOW RIVER AT TWISP, WA
#6 Met@Pateros 12449950 METHOW RIVER NEAR PATEROS, WA
#7 Met@Goat 12447383 METHOW RIVER ABOVE GOAT CREEK NEAR MAZAMA, WA

或者我们提取两列并用“ID”和“site_no”进行匹配

Gauges %>% 
mutate(Station = {
tmp <- readNWISsite(ID)[c('site_no', 'station_nm')]
tmp$station_nm[match(ID, tmp$site_no)]})
# A tibble: 7 x 3
# Name ID Station
# <chr> <chr> <chr>
#1 Twisp 12448998 TWISP RIVER NEAR TWISP, WA
#2 Chewuch 12448000 CHEWUCH RIVER AT WINTHROP, WA
#3 Andrews 12447390 ANDREWS CREEK NEAR MAZAMA, WA
#4 Met@Winthrop 12448500 METHOW RIVER AT WINTHROP, WA
#5 Met@Twisp 12449500 METHOW RIVER AT TWISP, WA
#6 Met@Pateros 12449950 METHOW RIVER NEAR PATEROS, WA
#7 Met@Goat 12447383 METHOW RIVER ABOVE GOAT CREEK NEAR MAZAMA, WA

关于r - Mutate 以错误的顺序返回数据 dplyr 这是一个错误吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59567452/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com