gpt4 book ai didi

sql - R:(SQL 风格)%LIKE% 语句

转载 作者:行者123 更新时间:2023-12-01 23:37:56 25 4
gpt4 key购买 nike

我想弄清楚如何在 R 编程语言中使用类似于 %LIKE% 的语句。使用以下 stackoverflow 帖子:How to join (merge) data frames (inner, outer, left, right) ,我能够弄清楚如何在 SQL 中运行基本合并。但出于某种原因,这不适用于 %LIKE% 条件。

例如,如果我创建以下数据:

table_a <- data.frame (

"name" = c("John", "ALex", "ToM", "Kev", "Peter"),
"color" = c("red", "blue", "green", "yellow", "pink")

)

name$table_a = as.factor(name$table_a)

table_b <- data.frame (

"name" = c("Johnathan", "Alexander", "Tomas", "Kevin", "Luke", "Ryan"),
"food" = c("pizza", tacos", "sushi", "cake", "brownies", "burgers")

)

name$table_b = as.factor(name$table_b)

table_c <- data.frame (

"name" = c("Johnatha", "Alexande1", "Toma1", "Kevi1", "Luk1"),
"food" = c("pizza", tacos", "sushi", "cake", "brownies")

)

name$table_c = as.factor(name$table_c)

现在我想做的是,如果 table_a 中的名称包含在 table_b 中的某个位置,则运行“左连接”。 (使用相同的逻辑,也应该可以使用单边的 %LIKE 吗?)

#Left joins

join_1 = merge(x = table_a, y = table_b, by = "%name%", all.x = TRUE)

join_2 = merge(x = table_b, y = table_c, by = "%name", all.x = TRUE)

在常规 SQL 语句中,如果数据行满足 %LIKE% 指定的条件,通常会直接选择数据行。在 R 中可以做同样的事情吗?

# select using %LIKE% (is there a way to override "case sensitivity" ? e.g. %like% "jOn"?)

selected_1 = table_a[name %like% "Jon"|| "Ale" || "Pet"]
selected_2 = table_a[name %like% "Jon"|| "Ale" || "Pet" || color %like% "ye"]

谢谢

最佳答案

您可以使用 fuzzyjoin 非常接近:

library(fuzzyjoin)
regex_right_join(table_b, table_a, by = "name", ignore_case = TRUE)
# Warning: Coercing `pattern` to a plain character vector.
# name.x food name.y color
# 1 Johnathan pizza John red
# 2 Alexander tacos ALex blue
# 3 Tomas sushi ToM green
# 4 Kevin cake Kev yellow
# 5 <NA> <NA> Peter pink

为了明确起见,fuzzyjoin 包将在这里重命名 by= 变量。虽然这看起来很不方便,但我个人发现它没有歧义,使用起来非常安全。尤其是在这里,两者通常不相同(因为根据我们的意图定义,一个是用于测试另一个的“模式”)。

顺便说一句:[name %like% "Jon"|| “啤酒” || "Pet"] 不会如您所愿:它首先根据 name %like% "Jon" 生成一个 logical 向量,然后将该向量作为一个整体元素与 "Ale"(这是一个错误)等进行 OR 运算。您可能正在寻找的是基于正则表达式的 %in% , 这不完全是这样。此外,|| 是单元素 OR,| 是矢量化 OR,我怀疑您想要后者。

对于类似正则表达式的 %in%,您需要这样的东西:

sapply(c("Jon", "Ale", "Pet"), grepl, x = table_a$name)
# Jon Ale Pet
# [1,] FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE
# [3,] FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE
# [5,] FALSE FALSE TRUE

table_a[rowSums(sapply(c("Jon", "Ale", "Pet"), grepl, x = table_a$name)) > 0,]
# name color
# 5 Peter pink
table_a[rowSums(sapply(c("Jon", "Ale", "Pet"), grepl, x = table_a$name)) > 0 |
grepl("ye", table_a$color),]
# name color
# 4 Kev yellow
# 5 Peter pink

可以像这样变成一个小的内联运算符:

`%regexin%` <- function(lhs, rhs) rowSums(sapply(rhs, grepl, x = as.character(lhs))) > 0
`%iregexin%` <- function(lhs, rhs) rowSums(sapply(rhs, grepl, x = as.character(lhs), ignore.case = TRUE)) > 0

table_a$name %regexin% c("Jon", "Ale", "Pet")
# [1] FALSE FALSE FALSE FALSE TRUE
subset(table_a, name %regexin% c("Jon", "Ale", "Pet"))
# name color
# 5 Peter pink

请注意,这对您的加入没有帮助,尽管它可能是某些相关逻辑的一部分。


数据:

table_a <- structure(list(name = structure(c(2L, 1L, 5L, 3L, 4L), .Label = c("ALex", "John", "Kev", "Peter", "ToM"), class = "factor"), color = c("red", "blue", "green", "yellow", "pink")), row.names = c(NA, -5L), class = "data.frame")
table_b <- structure(list(name = structure(c(2L, 1L, 6L, 3L, 4L, 5L), .Label = c("Alexander", "Johnathan", "Kevin", "Luke", "Ryan", "Tomas"), class = "factor"), food = c("pizza", "tacos", "sushi", "cake", "brownies", "burgers")), row.names = c(NA, -6L), class = "data.frame")
table_c <- structure(list(name = structure(c(2L, 1L, 5L, 3L, 4L), .Label = c("Alexande1", "Johnatha", "Kevi1", "Luk1", "Toma1"), class = "factor"), food = c("pizza", "tacos", "sushi", "cake", "brownies")), row.names = c(NA, -5L ), class = "data.frame")

关于sql - R:(SQL 风格)%LIKE% 语句,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65332533/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com