gpt4 book ai didi

替换 R 中的 NA - 适用于练习数据集,但在应用于实际数据时会发出警告消息

转载 作者:行者123 更新时间:2023-12-04 02:43:51 37 4
gpt4 key购买 nike

我在 中有一个数据集R 它看起来像,并且已经以与以下示例相同的方式进行了 reshape 。目的是将 NA 值转换为其他值(例如“FALSE”或“0”),然后可用于创建新列

ortho.test<-data.frame(rep("a",10));colnames(ortho.test)=("ODB6")
ortho.test$FBGN=c("FBgn0132258","FBgn0131535","FBgn0138769","FBgn01561235","FBgn0316645","FBgn874916","FBgn5758641","FBgn5279946","FBgn67543154","FBgn2451645")
ortho.test$Species=c("DROME","DROSI","DROSE","DROAN","DROYA","DROPS","DROPE","DROVI","DROGR","DROWI")

ortho<-reshape(ortho.test,direction="wide",idvar="ODB6",timevar="Species")
ortho$FBGN.DROME<-NA
is.na(ortho)

它返回一个向量,告诉我除了 FBGN.DROME 之外的所有内容都是 FALSE
使用以下 str() 输出:
> str(ortho)
'data.frame': 1 obs. of 11 variables:
$ ODB6 : Factor w/ 1 level "a": 1
$ FBGN.DROME: logi NA
$ FBGN.DROSI: chr "FBgn0131535"
$ FBGN.DROSE: chr "FBgn0138769"
$ FBGN.DROAN: chr "FBgn01561235"
$ FBGN.DROYA: chr "FBgn0316645"
$ FBGN.DROPS: chr "FBgn874916"
$ FBGN.DROPE: chr "FBgn5758641"
$ FBGN.DROVI: chr "FBgn5279946"
$ FBGN.DROGR: chr "FBgn67543154"
$ FBGN.DROWI: chr "FBgn2451645"
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "Species"
..$ idvar : chr "ODB6"
..$ times : chr "DROME" "DROSI" "DROSE" "DROAN" ...
..$ varying: chr [1, 1:10] "FBGN.DROME" "FBGN.DROSI" "FBGN.DROSE" "FBGN.DROAN" ...

我将我的 NA 更改为 0
ortho[is.na(ortho)]<-0
is.na(ortho)

它返回一个向量,告诉我现在所有都是 FALSE - 成功,因为现在我可以使用 ifelse() 创建一个列,以显示在任何列中哪些行没有 0 或 FALSE(或我用来替换 NA 的任何文本标签) ...

但是,当我将其应用于完整的数据帧时,NA 不会转换,我收到以下警告
> ortho[is.na(ortho)]<-0
There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(62938L, ... :
invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(67667L, ... :
invalid factor level, NAs generated
3: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(122384L, ... :
invalid factor level, NAs generated
4: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(136498L, ... :
invalid factor level, NAs generated
5: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(84764L, ... :
invalid factor level, NAs generated
6: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(162734L, ... :
invalid factor level, NAs generated
7: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(33586L, ... :
invalid factor level, NAs generated
8: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(38959L, ... :
invalid factor level, NAs generated
9: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(149363L, ... :
invalid factor level, NAs generated
10: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(846L, ... :
invalid factor level, NAs generated
11: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(98228L, ... :
invalid factor level, NAs generated
12: In `[<-.factor`(`*tmp*`, thisvar, value = structure(c(110267L, ... :
invalid factor level, NAs generated

这是 str() 输出
  > str(ortho)
'data.frame': 17217 obs. of 13 variables:
$ ODB6 : Factor w/ 17217 levels "EOG60023J","EOG60023K",..: 1 2 3 4 5 6 7 8 9 10 ...
$ FBGN.DROGR: Factor w/ 164289 levels "FBgn0000008",..: 62938 54687 54705 56261 52591 58895 52161 52477 59180 53404 ...
$ FBGN.DROMO: Factor w/ 164289 levels "FBgn0000008",..: 67667 65117 65951 66506 68291 71722 73134 68667 72523 76080 ...
$ FBGN.DROVI: Factor w/ 164289 levels "FBgn0000008",..: 122384 121133 120018 121674 NA 125620 123754 123969 127130 130755 ...
$ FBGN.DROWI: Factor w/ 164289 levels "FBgn0000008",..: 136498 136809 139642 137108 NA 141689 136363 137237 135869 132801 ...
$ FBGN.DROPE: Factor w/ 164289 levels "FBgn0000008",..: 84764 78121 81229 80829 85509 82276 79001 80267 77133 87679 ...
$ FBGN.DROPS: Factor w/ 164289 levels "FBgn0000008",..: 162734 158625 162203 158653 158028 22427 158179 13830 19898 160874 ...
$ FBGN.DROAN: Factor w/ 164289 levels "FBgn0000008",..: 33586 35261 35694 23649 33601 25796 33808 33861 25917 29992 ...
$ FBGN.DROER: Factor w/ 164289 levels "FBgn0000008",..: 38959 41203 40738 39865 38807 46087 38821 44982 47952 38091 ...
$ FBGN.DROYA: Factor w/ 164289 levels "FBgn0000008",..: 149363 153417 153106 152243 149654 147146 149664 149482 147635 144838 ...
$ FBGN.DROME: Factor w/ 164289 levels "FBgn0000008",..: 846 7219 6958 162946 525 1892 125 3510 163839 10111 ...
$ FBGN.DROSE: Factor w/ 164289 levels "FBgn0000008",..: 98228 94438 94153 102953 98068 95380 98082 92553 93497 95950 ...
$ FBGN.DROSI: Factor w/ 164289 levels "FBgn0000008",..: 110267 108223 107983 107246 110164 117494 116973 110504 106459 NA ...
- attr(*, "reshapeWide")=List of 5
..$ v.names: NULL
..$ timevar: chr "Species"
..$ idvar : chr "ODB6"
..$ times : Factor w/ 12 levels "DROAN","DROER",..: 3 5 10 11 6 7 1 2 12 4 ...
..$ varying: chr [1, 1:12] "FBGN.DROGR" "FBGN.DROMO" "FBGN.DROVI" "FBGN.DROWI" ...
>

你能帮我让主数据框像测试一样玩吗? (PS - 我知道我会得到“这是重复的,请阅读帮助页面并正确搜索”响应 - 但我已经搜索过,这就是我发现如何替换 NA 的方法,但我没有找到任何有同样的问题。)

最佳答案

你有一个因素问题。如果你查看你的真实数据集,你会注意到

Factor w/ 164289 levels .....

例如,
R> x = factor(c("A", "B"))
R> x[x=="A"] = 0
Warning message:
In `[<-.factor`(`*tmp*`, x == "A", value = 0) :
invalid factor level, NAs generated

您需要添加 0作为一个级别。所以像:
x = factor(x, levels=c(levels(x), 0))
x[is.na(x)] = 0

应该做的伎俩。但是,更好的策略是改变您读取数据的方式。例如,
read.table(filename, stringsAsFactors=FALSE)

关于替换 R 中的 NA - 适用于练习数据集,但在应用于实际数据时会发出警告消息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15341211/

37 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com