gpt4 book ai didi

r - 有效地用累积频率替换数据帧

转载 作者:行者123 更新时间:2023-12-04 12:07:16 26 4
gpt4 key购买 nike

我正在尝试编写一个程序,该程序采用一个大数据框并用这些值的累积频率(升序排序)替换每一列值。例如,如果一列值是:5, 8, 3, 5, 4, 3, 8, 5, 5, 1。那么相对频率和累积频率是:

  • 1:rel_freq=0.1,cum_freq = 0.1
  • 3:rel_freq=0.2,cum_freq=0.3
  • 4:rel_freq=0.1,cum_freq = 0.4
  • 5:rel_freq=0.4,cum_freq = 0.8
  • 8:rel_freq=0.2,cum_freq = 1.0

  • 那么原列变为:0.8, 1.0, 0.3, 0.8, 0.4, 0.3, 1.0, 0.8, 0.8, 0.1

    以下代码正确执行此操作,但由于嵌套循环,它的扩展性很差。知道如何更有效地执行此任务吗?
    mydata = read.table(.....)

    totalcols = ncol(mydata)
    totalrows = nrow(mydata)

    for (i in 1:totalcols) {
    freqtable = data.frame(table(mydata[,i])/totalrows) # create freq table
    freqtable$CumSum = cumsum(freqtable$Freq) # calc cumulative freq

    hashtable = new.env(hash=TRUE)
    nrows = nrow(freqtable)

    # store cum freq in hash
    for (x in 1:nrows) {
    dummy = toString(freqtable$Var1[x])
    hashtable[[dummy]] = freqtable$CumSum[x]
    }

    # replace original data with cum freq
    for (j in 1:totalrows) {
    dummy = toString(mydata[j,i])
    mydata[j,i] = hashtable[[dummy]]
    }
    }

    最佳答案

    这处理没有 for 的单列-环形:

    R> x <- c(5, 8, 3, 5, 4, 3, 8, 5, 5, 1)
    R> y <- cumsum(table(x)/length(x))
    R> y[as.character(x)]
    5 8 3 5 4 3 8 5 5 1
    0.8 1.0 0.3 0.8 0.4 0.3 1.0 0.8 0.8 0.1

    关于r - 有效地用累积频率替换数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13037111/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com