gpt4 book ai didi

r - 如何将 data.frame 强制转换为 R 中的稀疏矩阵

转载 作者:行者123 更新时间:2023-12-05 04:35:06 24 4
gpt4 key购买 nike

我正在尝试按照此处的示例进行操作:cui2vecWorkflow通过创建类似于此处的矩阵 term_cooccurrence_matrix.rda具有以下属性:

> cooc<-get(load('~/development/cui2vec/vignettes/term_cooccurrence_matrix.rda'))
> str(cooc)
Formal class 'dsCMatrix' [package "Matrix"] with 7 slots
..@ i : int [1:2366] 0 1 2 0 1 2 3 4 3 5 ...
..@ p : int [1:101] 0 1 2 3 7 8 10 17 19 27 ...
..@ Dim : int [1:2] 100 100
..@ Dimnames:List of 2
.. ..$ : chr [1:100] "C0016875" "C0162770" "C0024730" "C0038689" ...
.. ..$ : chr [1:100] "C0016875" "C0162770" "C0024730" "C0038689" ...
..@ x : num [1:2366] 412 6286 8280 118 110 ...
..@ uplo : chr "U"
..@ factors : list()

我的数据框是这样的:

> test
CUI1 CUI2 Count
1 C0000699 C3894683 2
2 C0000699 C0101725 1
3 C0000699 C1882413 3
4 C0000699 C0245531 3
5 C0000699 C0068475 2
6 C0000699 C0538927 3
7 C0000699 C0724693 1
8 C0000699 C0216784 2
9 C0000699 C2248020 1
10 C0000699 C0069449 3
...

但是当我读入它并转换为矩阵时,它显然不会是相同的结构,按照

> mat <- as.matrix(test)
> str(mat)
chr [1:1000000, 1:3] "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" "C0000699" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:3] "CUI1" "CUI2" "Count"

然后我采取下一步并将矩阵 mat 强制转换为稀疏矩阵:

> mat <- as(mat,  "sparseMatrix")
> str(mat)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:3000000] 0 1 2 3 4 5 6 7 8 9 ...
..@ p : int [1:4] 0 1000000 2000000 3000000
..@ Dim : int [1:2] 1000000 3
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "CUI1" "CUI2" "Count"
..@ x : num [1:3000000] NA NA NA NA NA NA NA NA NA NA ...
..@ factors : list()

但是结构看起来不对。

尝试这个,我得到一个错误:

> mat <- as(mat,  "dsCMatrix")
Error in asMethod(object) :
not a symmetric matrix; consider forceSymmetric() or symmpart()
In addition: Warning message:
In storage.mode(from) <- "double" : NAs introduced by coercion

所以我试试这个:

> mat <- as(forceSymmetric(mat),  "dsCMatrix")
Error in forceSymmetric(mat) :
invalid class 'NA' to dup_mMatrix_as_geMatrix

(我还没有找到任何关于如何从 data.frame 构造 structure("dsCMatrix", package = "Matrix") 类矩阵的示例,所以我我只是即兴发挥)。

看起来 DimDimnames 以及 x 的值都没有正确定义。

最佳答案

正在关注 user20650's comment ,首先将 CUI* 列强制分解为具有相同级别的因子,然后使用 xtabs 创建稀疏矩阵,然后添加其转置。

txt <- '
CUI1 CUI2 Count
1 C0000699 C3894683 2
2 C0000699 C0101725 1
3 C0000699 C1882413 3
4 C0000699 C0245531 3
5 C0000699 C0068475 2
6 C0000699 C0538927 3
7 C0000699 C0724693 1
8 C0000699 C0216784 2
9 C0000699 C2248020 1
10 C0000699 C0069449 3
'
test <- read.table(textConnection(txt), header = TRUE)

library(Matrix)

levls <- Reduce(union, test[1:2])
test[1:2] <- lapply(test[1:2], factor, levels = levls)
res <- xtabs(Count ~ CUI1 + CUI2, data = test, sparse = TRUE)
res <- forceSymmetric(res)
class(res)
#> [1] "dsCMatrix"
#> attr(,"package")
#> [1] "Matrix"

reprex package 创建于 2022-02-13 (v2.0.1)

关于r - 如何将 data.frame 强制转换为 R 中的稀疏矩阵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71102912/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com