gpt4 book ai didi

r - 使用 `dbplyr` 和 `RPostgres` 取消嵌套 json 列

转载 作者:行者123 更新时间:2023-11-29 12:07:23 27 4
gpt4 key购买 nike

我在 PostgreSQL 数据库的表中有一列包含 json 字符串。

我想在服务器端解除它的嵌套,并且我能够找出执行此操作的 SQL 代码。我面临的挑战是我希望能够将此操作插入到 dbplyr 管道链中,即更新 tbl_lazy 对象并且尚未运行查询,我对 dbplyr 的内部结构了解不够,无法做到这一点。

请看下面的例子:

设置

library("RPostgres")
library("dplyr")

drv <- RPostgres::dbDriver("Postgres")
#### NOT REPRODUCIBLE!!! ####
con <- RPostgres::dbConnect(drv, dbname = mydbname, sslmode = 'require',
host = myhost, port = 5432,
user = user, password = mypassword)
#############################

my_tbl <- tribble(~a, ~bcd,
1, '{"b": "foo1", "c": "bar1", "d": "baz1"}',
2, '{"b": "foo2", "c": "bar2", "d": "baz2"}')

copy_to(con, my_tbl, "my_tbl",
temporary = TRUE)

部分解决方案(无惰性求值)

unnest_json <-function(data, json_col, ...){
# build character vector whose names are cols to be created and values columns
# to be extracted
dots <- sapply(as.list(substitute(list(...)))[-1], as.character)
json_col <- as.character(substitute(json_col))
# json extraction string
query0 <- sprintf("%s::json->'%s' as %s",json_col, dots, names(dots))
# complete query
query <- sprintf("SELECT *, %s FROM (%s) AS PREV",
paste(query0, collapse = ", "),
dbplyr::sql_render(data))
# fetch (when I'd rather update the tbl_lazy object instead)
dbGetQuery(data$src$con, query)
}

con %>%
tbl("my_tbl") %>%
unnest_json(bcd, unnested_b = "b", unnested_c = "c")
# a bcd unnested_b unnested_c
# 1 1 {"b": "foo1", "c": "bar1", "d": "baz1"} "foo1" "bar1"
# 2 2 {"b": "foo2", "c": "bar2", "d": "baz2"} "foo2" "bar2"

想要的功能

我希望能够做到,例如:

con %>%
tbl("my_tbl") %>%
unnest_json(bcd, unnested_b = "b", unnested_c = "c") %>% # not evaluated at this point
select(-bcd) %>%
head(1) %>%
collect()

# a unnested_b unnested_c
# 1 1 "foo1" "bar1"

最佳答案

诀窍是使用函数 sql 调用 tbl

sql 应该被提供包含查询的字符串。

所以函数变成了:

unnest_json <-function(.data,.json_col, ...){
# build character vector whose names are cols to be created and values columns
# to be extracted
dots <- sapply(as.list(substitute(list(...)))[-1], as.character)
.json_col <- as.character(substitute(.json_col))
query0 <- sprintf("%s::json ->>'%s' as %s", .json_col, dots, names(dots))
query <- sprintf("SELECT *, %s FROM (%s) AS PREV",
paste(query0, collapse = ", "),
dbplyr::sql_render(.data))
tbl(.data$src$con, sql(query))
}

我还在查询中将 -> 更改为 ->> 以获得正确的输出。

unnest_json 的输出:

con %>%
tbl("my_tbl") %>%
unnest_json(bcd, unnested_b = "b", unnested_c = "c")
# # Source: SQL [?? x 4]
# # Database: postgres [standtasic@adbsg@adbsg.postgres.database.azure.com:5432/standtasicdb]
# a bcd unnested_b unnested_c
# <dbl> <chr> <chr> <chr>
# 1 1 "{\"b\": \"foo1\", \"c\": \"bar1\", \"d\": \"baz1\"}" foo1 bar1
# 2 2 "{\"b\": \"foo1\", \"c\": \"bar1\", \"d\": \"baz1\"}" foo1 bar1

dbplyr 链中使用并收集:

con %>%
tbl("my_tbl") %>%
unnest_json(bcd, unnested_b = "b", unnested_c = "c") %>%
select(-bcd) %>%
head(1) %>%
collect()
# # A tibble: 1 x 3
# a unnested_b unnested_c
# <dbl> <chr> <chr>
# 1 1 foo1 bar1

关于r - 使用 `dbplyr` 和 `RPostgres` 取消嵌套 json 列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55179668/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com