gpt4 book ai didi

r - 闭包作为数据合并习语的解决方案

转载 作者:行者123 更新时间:2023-12-04 15:26:11 24 4
gpt4 key购买 nike

我正试图围绕闭包,我想我已经找到了一个可能有帮助的案例。

我有以下工作要做:

  • 一组用于清理状态名称的正则表达式,位于函数
  • 带有状态名称(上述函数创建的标准化形式)和状态 ID 代码的 data.frame,用于链接两者(“合并图”)

  • 这个想法是,给定一些带有草率州名的data.frame(首都是否列为“华盛顿特区”、“华盛顿特区”、“哥伦比亚特区”等?),让一个函数返回相同的数据.frame 删除了州名称列,只剩下州 ID 代码。然后后续的合并可以持续发生。

    我可以通过多种方式做到这一点,但一种似乎特别优雅的方式是将合并映射和正则表达式和代码处理在闭包内的所有内容(遵循闭包是一个带有数据的函数的想法)。

    问题1:这是一个合理的想法吗?

    问题2:如果是这样,我该如何在R中做到这一点?

    这是一个适用于示例数据的愚蠢简单的干净状态名称函数:
    cleanStateNames <- function(x) {
    x <- tolower(x)
    x[grepl("columbia",x)] <- "DC"
    x
    }

    以下是最终函数将在其上运行的一些示例数据:
    dat <- structure(list(state = c("Alabama", "Alaska", "Arizona", "Arkansas", 
    "California", "Colorado", "Connecticut", "Delaware", "District of Columbia",
    "Florida"), pop08 = structure(c(29L, 44L, 40L, 18L, 25L, 30L,
    22L, 48L, 36L, 13L), .Label = c("1,050,788", "1,288,198", "1,315,809",
    "1,316,456", "1,523,816", "1,783,432", "1,814,468", "1,984,356",
    "10,003,422", "11,485,910", "12,448,279", "12,901,563", "18,328,340",
    "19,490,297", "2,600,167", "2,736,424", "2,802,134", "2,855,390",
    "2,938,618", "24,326,974", "3,002,555", "3,501,252", "3,642,361",
    "3,790,060", "36,756,666", "4,269,245", "4,410,796", "4,479,800",
    "4,661,900", "4,939,456", "5,220,393", "5,627,967", "5,633,597",
    "5,911,605", "532,668", "591,833", "6,214,888", "6,376,792",
    "6,497,967", "6,500,180", "6,549,224", "621,270", "641,481",
    "686,293", "7,769,089", "8,682,661", "804,194", "873,092", "9,222,414",
    "9,685,744", "967,440"), class = "factor")), .Names = c("state",
    "pop08"), row.names = c(NA, 10L), class = "data.frame")

    还有一个示例合并图(实际的是将 FIPS 代码链接到状态,因此不能轻易生成):
    merge_map <- data.frame(state=dat$state, id=seq(10) )

    编辑 在下面 crippledlambda 的答案的基础上,这里是对该函数的尝试:
    prepForMerge <- local({
    merge_map <- structure(list(state = c("alabama", "alaska", "arizona", "arkansas", "california", "colorado", "connecticut", "delaware", "DC", "florida" ), id = 1:10), .Names = c("state", "id"), row.names = c(NA, -10L ), class = "data.frame")
    list(
    replace_merge_map=function(new_merge_map) {
    merge_map <<- new_merge_map
    },
    show_merge_map=function() {
    merge_map
    },
    return_prepped_data.frame=function(dat) {
    dat$state <- cleanStateNames(dat$state)
    dat <- merge(dat,merge_map)
    dat <- subset(dat,select=c(-state))
    dat
    }
    )
    })

    > prepForMerge$return_prepped_data.frame(dat)
    pop08 id
    1 4,661,900 1
    2 686,293 2
    3 6,500,180 3
    4 2,855,390 4
    5 36,756,666 5
    6 4,939,456 6
    7 3,501,252 7
    8 591,833 9
    9 873,092 8
    10 18,328,340 10

    在我认为这个问题已经解决之前,还有两个问题:
  • 每次调用 prepForMerge$return_prepped_data.frame(dat) 都很痛苦。有什么方法可以让我调用 prepForMerge(dat) 的默认函数?我猜没有给出它是如何实现的,但也许至少有一个默认 fxn 的约定....
  • 如何避免在merge_map 定义中混合数据和代码?理想情况下,我会在其他地方清理merge_map,然后将其捕获并存储在闭包内。
  • 最佳答案

    我可能错过了您的问题的重点,但这是您可以使用闭包的一种方式:

    > replaceStateNames <- local({
    + statenames <- c("Alabama", "Alaska", "Arizona", "Arkansas",
    + "California", "Colorado", "Connecticut", "Delaware",
    + "District of Columbia", "Florida")
    + function(patt,newtext) {
    + statenames <- tolower(statenames)
    + statenames[grepl(patt,statenames)] <- newtext
    + statenames
    + }
    + })
    >
    > replaceStateNames("columbia","DC")
    [1] "alabama" "alaska" "arizona" "arkansas" "california"
    [6] "colorado" "connecticut" "delaware" "DC" "florida"
    > replaceStateNames("alaska","palincountry")
    [1] "alabama" "palincountry" "arizona"
    [4] "arkansas" "california" "colorado"
    [7] "connecticut" "delaware" "district of columbia"
    [10] "florida"
    > replaceStateNames("florida","jebbushland")
    [1] "alabama" "alaska" "arizona"
    [4] "arkansas" "california" "colorado"
    [7] "connecticut" "delaware" "district of columbia"
    [10] "jebbushland"
    >

    但概括地说,您可以替换 statenames使用您的数据框定义,并返回一个使用此数据框的函数(或函数列表),而无需将其作为参数传递给函数调用。示例(但请注意,我在 ignore.case=TRUE 中使用了 grepl 参数):
    > replaceStateNames <- local({
    + statenames <- c("Alabama", "Alaska", "Arizona", "Arkansas",
    + "California", "Colorado", "Connecticut", "Delaware",
    + "District of Columbia", "Florida")
    + list(justreturn=function(patt,newtext) {
    + statenames[grepl(patt,statenames,ignore.case=TRUE)] <- newtext
    + statenames
    + },reassign=function(patt,newtext) {
    + statenames <<- replace(statenames,grepl(patt,statenames,ignore.case=TRUE),newtext)
    + statenames
    + })
    + })

    就像第一个例子一样:
    > replaceStateNames$justreturn("columbia","DC")
    [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California"
    [6] "Colorado" "Connecticut" "Delaware" "DC" "Florida"

    只返回 statenames 的词法范围值检查原始值是否不变:
    > replaceStateNames$justreturn("shouldnotmatch","anythinghere")
    [1] "Alabama" "Alaska" "Arizona"
    [4] "Arkansas" "California" "Colorado"
    [7] "Connecticut" "Delaware" "District of Columbia"
    [10] "Florida"

    做同样的事情,但使更改“永久”:
    > replaceStateNames$reassign("columbia","DC")
    [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California"
    [6] "Colorado" "Connecticut" "Delaware" "DC" "Florida"

    并注意 statenames 的值附加到这些功能已经改变。
    > replaceStateNames$justreturn("shouldnotmatch","anythinghere")
    [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California"
    [6] "Colorado" "Connecticut" "Delaware" "DC" "Florida"

    在任何情况下,您都可以替换 statenames带有数据框,这些简单的功能带有“合并图”或您想要的任何其他映射。

    编辑

    说到“合并”,这就是你要找的吗?第一个 ?merge 的实现使用闭包的示例:
    > authors <- data.frame(surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
    + nationality = c("US", "Australia", "US", "UK", "Australia"),
    + deceased = c("yes", rep("no", 4)))
    > books <- data.frame(name = I(c("Tukey", "Venables", "Tierney",
    + "Ripley", "Ripley", "McNeil", "R Core")),
    + title = c("Exploratory Data Analysis",
    + "Modern Applied Statistics ...",
    + "LISP-STAT",
    + "Spatial Statistics", "Stochastic Simulation",
    + "Interactive Data Analysis",
    + "An Introduction to R"),
    + other.author = c(NA, "Ripley", NA, NA, NA, NA,
    + "Venables & Smith"))
    >
    > mergewithauthors <- with(list(authors=authors),function(books)
    + merge(authors, books, by.x = "surname", by.y = "name"))
    >
    > mergewithauthors(books)
    surname nationality deceased title other.author
    1 McNeil Australia no Interactive Data Analysis <NA>
    2 Ripley UK no Spatial Statistics <NA>
    3 Ripley UK no Stochastic Simulation <NA>
    4 Tierney US no LISP-STAT <NA>
    5 Tukey US yes Exploratory Data Analysis <NA>
    6 Venables Australia no Modern Applied Statistics ... Ripley

    编辑 2

    要将文件读入将被词法绑定(bind)的对象,您可以这样做
    fn <- local({
    data <- read.csv("filename.csv")
    function(...) {
    ...
    }
    })

    或者
    fn <- with(list(data=read.csv("filename.csv")),
    function(...) {
    ...
    }
    })

    或者
    fn <- with(local(data <- read.csv("filename.csv")),
    function(...) {
    ...
    }
    })

    等等。 (我假设函数(...)将与您的“merge_map”有关)。您也可以使用 evalq代替 local .要“引入”驻留在全局空间(或封闭环境)中的对象,您只需执行以下操作
    globalobj <- value      ## could be from read.csv()
    fn <- local({
    localobj <- globalobj ## if globalobj is not locally defined,
    ## R will look in enclosing environment
    ## in this case, the globalenv()
    function(...) {
    ...
    }
    })

    然后修改 globalobj以后不会改变 localobj附加到函数(因为几乎(?)R中的所有内容都遵循按值传递的语义)。您也可以使用 with而不是 local如上例所示。

    关于r - 闭包作为数据合并习语的解决方案,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7797273/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com