r - 复杂的 reshape-6ren

r - 复杂的 reshape

转载作者：行者123 更新时间：2023-12-02 17:50:09

24

4

我想将我的数据框从长格式 reshape 为宽格式，但我丢失了一些我想保留的数据。对于以下示例:

df <- data.frame(Par1 = unlist(strsplit("AABBCCC","")),
                 Par2 = unlist(strsplit("DDEEFFF","")),
                 ParD = unlist(strsplit("foo,bar,baz,qux,bla,xyz,meh",",")),
                 Type = unlist(strsplit("pre,post,pre,post,pre,post,post",",")),
                 Val = c(10,20,30,40,50,60,70))

   #     Par1 Par2 ParD Type Val
   #   1    A    D  foo  pre  10
   #   2    A    D  bar post  20
   #   3    B    E  baz  pre  30
   #   4    B    E  qux post  40
   #   5    C    F  bla  pre  50
   #   6    C    F  xyz post  60
   #   7    C    F  meh post  70

dfw <- dcast(df,
             formula = Par1 + Par2 ~ Type,
             value.var = "Val",
             fun.aggregate = mean)

 #     Par1 Par2 post pre
 #   1    A    D   20  10
 #   2    B    E   40  30
 #   3    C    F   65  50

这几乎是我需要的，但我想要拥有

某些字段保存来自 ParD 字段的数据(例如，作为单个合并字符串)，
用于聚合的观测值数量。

即我希望生成的 data.frame 如下:

    #     Par1 Par2 post pre Num.pre Num.post ParD
    #   1    A    D   20  10      1      1    foo_bar 
    #   2    B    E   40  30      1      1    baz_qux
    #   3    C    F   65  50      1      2    bla_xyz_meh

如果有任何想法，我将不胜感激。例如，我尝试通过写入 dcast 来解决第二个任务: fun.aggregate=function(x) c(Val=mean(x),Num=length(x)) - 但这会导致一个错误。

最佳答案

迟到了，但这里有另一个使用 data.table 的替代方案:

require(data.table)
dt <- data.table(df, key=c("Par1", "Par2"))
dt[, list(pre=mean(Val[Type == "pre"]), 
          post=mean(Val[Type == "post"]), 
          pre.num=length(Val[Type == "pre"]), 
          post.num=length(Val[Type == "post"]), 
          ParD = paste(ParD, collapse="_")), 
by=list(Par1, Par2)]

#    Par1 Par2 pre post pre.num post.num        ParD
# 1:    A    D  10   20       1        1     foo_bar
# 2:    B    E  30   40       1        1     baz_qux
# 3:    C    F  50   65       1        2 bla_xyz_meh

<小时/>

[来自 Matthew] +1 一些小的改进，以节省重复相同的 ==，并演示 j 内的局部变量。

dt[, list(pre=mean(Val[.pre <- Type=="pre"]),     # save .pre
          post=mean(Val[.post <- Type=="post"]),  # save .post
          pre.num=sum(.pre),                      # reuse .pre
          post.num=sum(.post),                    # reuse .post
          ParD = paste(ParD, collapse="_")), 
by=list(Par1, Par2)]

#    Par1 Par2 pre post pre.num post.num        ParD
# 1:    A    D  10   20       1        1     foo_bar
# 2:    B    E  30   40       1        1     baz_qux
# 3:    C    F  50   65       1        2 bla_xyz_meh

dt[, { .pre <- Type=="pre"                  # or save .pre and .post up front 
       .post <- Type=="post"
       list(pre=mean(Val[.pre]), 
            post=mean(Val[.post]),
            pre.num=sum(.pre),
            post.num=sum(.post), 
            ParD = paste(ParD, collapse="_")) }
, by=list(Par1, Par2)]

#    Par1 Par2 pre post pre.num post.num        ParD
# 1:    A    D  10   20       1        1     foo_bar
# 2:    B    E  30   40       1        1     baz_qux
# 3:    C    F  50   65       1        2 bla_xyz_meh

如果 list 列比 paste 更合适，那么这应该会更快:

dt[, { .pre <- Type=="pre"
       .post <- Type=="post"
       list(pre=mean(Val[.pre]), 
            post=mean(Val[.post]),
            pre.num=sum(.pre),
            post.num=sum(.post), 
            ParD = list(ParD)) }     # list() faster than paste()
, by=list(Par1, Par2)]

#    Par1 Par2 pre post pre.num post.num        ParD
# 1:    A    D  10   20       1        1     foo,bar
# 2:    B    E  30   40       1        1     baz,qux
# 3:    C    F  50   65       1        2 bla,xyz,meh

关于r - 复杂的 reshape ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15182888/

24

4

0

文章推荐： .net - 在 IIS Express 中托管 WCF 服务时出现问题

文章推荐： haskell - 通常说来，monad变压器是否由附加条件产生？

reshape - 如何检查APL中的字符串是否被 reshape ？
如何检查字符串是否被 reshape ？示例:“aab”返回 0，因为“a”无法 reshape 为该字符串或任何其他更短的字符串。另一个例子是“aabbaab”返回 1，因为“aabb”可以被 r
reshape - Theano reshape
我无法清楚地理解theano的reshape。我有一个形状的图像矩阵: [batch_size, stack1_size, stack2_size, height, width] ，其中有 s
reshape - 如何检查APL中的字符串是否被 reshape ？
如何检查字符串是否被 reshape ？示例:“aab”返回 0，因为“a”无法 reshape 为该字符串或任何其他更短的字符串。另一个例子是“aabbaab”返回 1，因为“aabb”可以被 r
reshape - 如何像这样使用 python reshape 数据集
这是原始数据 a=[[1,2,3,4,5,6], [7,8,9,10,11,12]] 我想把它转换成这样的格式: b=[[1,2,3,7,8,9], [4,5,6,10,11,12]] a
python - 只是 reshape 和 reshape 和获得转置之间的区别？
我目前正在学习 CS231 作业，我意识到一些令人困惑的事情。在计算梯度时，当我第一次 reshape x 然后得到转置时，我得到了正确的结果。 x_r=x.reshape(x.shape[0],-1
r - 如何使用 reshape 包 reshape 此数据框
这个问题在这里已经有了答案: Reshaping multiple sets of measurement columns (wide format) into single columns (lon
当 reshape 无法猜测时变变量的名称时， reshape r 中的数据
我有一个包含超过 1500 列的宽格式数据集。由于许多变量都是重复的，我想将其 reshape 为长形式。然而，r 抛出一个错误: Error in guess(varying) : Failed
从长到宽 reshape 数据 - 了解 reshape 参数
我有一个长格式的数据框狗，我正在尝试使用 reshape() 函数将其重新格式化为宽格式。目前看起来是这样的: dogid month year trainingtype home scho
python - NumPy 使用 reshape 函数 reshape 数组
这个问题在这里已经有了答案: how to reshape an N length vector to a 3x(N/3) matrix in numpy using reshape (1 个回答)
python - 'numpy.reshape' 和 'ndarray.reshape' 如何等效？
我对 ndarray.reshape 的结构有疑问.我读过 numpy.reshape()和 ndarray.reshape是 python 中用于 reshape 数组的等效命令。据我所知，num
reshape - 在 Stata 中没有唯一的 "j"变量的情况下如何 reshape ？
所以这是我的麻烦:我想将一个长格式的数据文件改成宽格式。但是，我没有唯一的“j”变量；长格式文件中的每条记录都有几个关键变量。例如，我想这样做: | caseid | gender | age |
从 base reshape vs 从具有缺失值的 reshape2 reshape
Whis 这个数据框， df df id parameter visit value sex 1 01 blood V1 1 f 2 01 saliva V
python - reshape numpy 数组的列表，然后 reshape 回来
我有一个列表，其中包含几个不同形状的 numpy 数组。我想将这个数组列表 reshape 为一个 numpy 向量，然后更改向量中的每个元素，然后将其 reshape 回原始数组列表。例如: 输入
Python 使用 np.reshape 按特定顺序 reshape 数组
我有一个形状为 (1800,144) 的数组 (a) 其中 a[0:900,:] 都是实数，后半部分数组 a[900:1800,:] 全部为零。我想把数组的后半部分水平地放在前半部分旁边，然后将它们推
python - 在 Python 中使用 reshape reshape 数组
我有一个如下所示的数组: array([[0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2
python - 为什么 Tensorflow Reshape tf.reshape() 会破坏梯度流？
我正在创建一个 tf.Variable()，然后使用该变量创建一个简单的函数，然后我使用 tf.reshape() 展平原始变量，然后我在函数和展平变量之间使用了 tf.gradients()。为什么
python - 使用 array.reshape(-1, 1) reshape 数组
我有一个名为 data 的数据框，我试图从中识别任何异常价格。数据框头部看起来像: Date Last Price 0 29/12/2017 487.74 1 28/
python - 使用 numpy reshape 数组 - ValueError : cannot reshape array
我有一个 float vec 数组，我想对其进行 reshape vec.shape >>> (3,) len(vec[0]) # all 3 rows of vec have 150 columns
python - 在不使用 reshape 的情况下 reshape n 维数组的 View
tl;dr 我可以在不使用 numpy.reshape 的情况下将 numpy 数组的 View 从 5x5x5x3x3x3 reshape 为 125x1x1x3x3x3 吗？我想对一个体积(大小
reshape() function to make wide to long data(RESHAPE()函数使数据变宽变长)
set.seed(123)data <- data.frame(ID = 1:10, weight_hus = rnorm(10, 0, 1),

首页

博学

6Ren·AI

商城

r - 复杂的 reshape