gpt4 book ai didi

r - SQL 查询列表内的数据框

转载 作者:行者123 更新时间:2023-12-02 03:21:45 26 4
gpt4 key购买 nike

给定数据框

df1 <- data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3)))
df2 <- data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1)))

存储在列表中

dflist <- c(df1,df2)

如何在这些数据帧上运行 sqldf 查询(连接)?

失败的尝试:

test <- sqldf("select a.CustomerId, a.Product, b.State from dflist[1] a
inner join dflist[2] b on b.id = a.id")

test <- sqldf("select a.CustomerId, a.Product, b.State from dflist$df1 a
inner join dflist$df2 b on b.CustomerId = a.CustomerId")

最佳答案

如果将列表中的 data.frames 复制到新环境,则可以使用 sqldfenvir 参数或通过命名列表的元素,并使用 with

请注意以下几点:

  • 我使用 list 而不是 c 创建 dflist

注意区别

str(c(df1,df2))
##List of 4
## $ CustomerId: int [1:6] 1 2 3 4 5 6
## $ Product : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1
## $ CustomerId: num [1:3] 2 4 6
## $ State : Factor w/ 2 levels "Alabama","Ohio": 1 1 2

str(list(df1,df2))
##List of 2
## $ :'data.frame': 6 obs. of 2 variables:
## ..$ CustomerId: int [1:6] 1 2 3 4 5 6
## ..$ Product : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1
## $ :'data.frame': 3 obs. of 2 variables:
## ..$ CustomerId: num [1:3] 2 4 6
## ..$ State : Factor w/ 2 levels "Alabama","Ohio": 1 1 2
  • 我已经调整了 sql 查询以反射(reflect) data.frames 中的名称(按照您的第二种方法)

命名数据

dflist <- list(df1,df2)
names(dflist) <- c('df1','df2')

创建新的工作环境

# create a new environment

e <- new.env()
# assign the elements of dflist to this new environment
for(.x in names(dflist)){
assign(value = dflist[[.x]], x=.x, envir = e)
}

# this could also be done using mapply / lapply
# eg
# invisible(mapply(assign, value = dflist, x = names(dflist), MoreArgs =list(envir = e)))
# run the sql query
sqldf("select a.CustomerId, a.Product, b.State from df1 a
inner join df2 b on b.CustomerId = a.CustomerId", envir = e)

## CustomerId Product State
## 1 2 Toaster Alabama
## 2 4 Radio Alabama
## 3 6 Radio Ohio

使用 with 的更简单方法

您可以简单地使用 with 在本地进行计算(重要的是 dflist 是一个命名列表)

# this is far simpler!!
with(dflist,sqldf("select a.CustomerId, a.Product, b.State from df1 a
inner join df2 b on b.CustomerId = a.CustomerId"))

使用proto的另一种简单方法

  • 感谢@G.Grothendieck(请参阅评论

这使用了 proto 包,该包由 sqldf 加载

dflist <- list(a = df1, b = df2)
sqldf( "select a.CustomerId, a.Product, b.State from df1 a
inner join df2 b on b.CustomerId = a.CustomerId",
envir = as.proto(dflist))

使用数据表

或者您可以使用data.table,它提供类似sql的方法(请参阅FAQ 2.16)

library(data.table)
dflist <- list(data.table(df1),data.table(df2))
names(dflist) <- c('df1','df2')
invisible(lapply(dflist, setkeyv, 'CustomerId'))
with(dflist, df1[df2])
## CustomerId Product State
## 1: 2 Toaster Alabama
## 2: 4 Radio Alabama
## 3: 6 Radio Ohio

关于r - SQL 查询列表内的数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12346027/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com