c++ - 具有不同值的Rcpp函数填充矩阵-6ren

c++ - 具有不同值的Rcpp函数填充矩阵

转载作者：行者123 更新时间：2023-12-02 09:59:23

我正在构建一个将实例化NumericMatrix并用Sorenson-Dice相似系数(相似矩阵)填充的过程。矩阵本身的尺寸可变，并取决于要处理的元素数量。通常，可以随时比较100多个单独的元素(因此，矩阵尺寸通常为100+ 100+)。到目前为止，我所构建的将创建矩阵，并计算系数，然后将这些计算出的值填充到矩阵中。但是，当我重复运行该函数时，我注意到矩阵内的值在每次运行之间都会发生变化，这不是预期的行为，因为要比较的数据在每次运行之间都不会发生更改或重新排序。我也得到大于1的相似性，这绝对不应该发生。我有四个函数，一个用于查找系数的分子，一个用于查找分母，一个用于使用分子和分母函数来计算系数，第四个用于将系数放入矩阵中。
这是c++代码:

// function to calculate the denominator of the dice coefficient
int diceDenomcpp(NumericVector val1, NumericVector val2){
  
  
  int val1Len = na_omit(val1).size();
  int val2Len = na_omit(val2).size();
  int bands = 0;
  
  
  bands = val1Len + val2Len;
  // return the computed total data points within both arrays
  
  
  return bands;
}

//######################################################################
//######################################################################
//######################################################################

// function to calculate the numerator for the dice coefficient
int diceNumcpp(NumericVector iso1, NumericVector iso2){
  
  // declare and initialize vectors with the element band data
  // remove any NA values within each vector
  NumericVector is1 = na_omit(iso1);
  NumericVector is2 = na_omit(iso2);
  
  // declare and initialize some counter variables
  int n = 0;
  int m = 0;
  int match = 0;
  
  // loop through the first element's first datum and check for matching datum
  // with the second element then continue to loop through each datum within each element 
  while (n<=is1.size()){
    if (m>=is2.size()){
      n++;
      m=0;
    }
    // if a suitable match is found, increment the match variable
    if((fabs(is1[n]-is2[m])/is1[n])<0.01 && (fabs(is1[n]-is2[m])/is2[m])<0.01){
      match++;
      
    }
    m++;
  }
  return match;
}

//########################################################################
//########################################################################
//########################################################################

// function to put the coefficient together
double diceCoefcpp(NumericVector val1, NumericVector val2){
  
  NumericVector is1 = clone(val1);
  NumericVector is2 = clone(val2);
  double dVal;
  double num = 2*diceNumcpp(is1, is2);
  double denom = diceDenomcpp(is1, is2);
  
  dVal = num/denom;
  
  return dVal;
  
}

//#######################################################################
//#######################################################################
//#######################################################################


// function to build the similarity matrix with the coefficients

NumericMatrix simMatGencpp(NumericMatrix df){
  
  // clone the input data frame
  NumericMatrix rapdDat = clone(df);

  // create a data frame for the output 
  NumericMatrix simMat(rapdDat.nrow(),rapdDat.nrow());
    std::fill(simMat.begin(), simMat.end(), NumericVector::get_na());
  
  // declare and initialize the iterator
  int i = 0;

  // declare and initialize the column counter
  int col = 0;  
  
  // declare an initialize the isolate counter
  int iso = 0;
  
  //simMat(_,0)=rapdDat(_,0);
  
  while (iso < rapdDat.nrow()){
    if (iso+i > rapdDat.nrow()){
      col++;
      i=0;
      iso++;
    }
    if (iso+i < rapdDat.nrow()){
      simMat(iso+i, col) = diceCoefcpp(rapdDat(iso,_), rapdDat(iso+i,_));
      
    }
    i++;
  }
  
  
  //Rcout << "SimMatrix:" << simMat << "\n";
  
  return simMat;
}

这是输入数据的样例。。。

sampleData

    band1  band2  band3  band4  band5  band6
1   593.05 578.04 439.01     NA     NA     NA
2   589.07 567.03     NA     NA     NA     NA
3   591.04 575.10 438.12     NA     NA     NA
4   591.04     NA     NA     NA     NA     NA
5   588.08 573.18     NA     NA     NA     NA
6   591.04 576.09 552.10     NA     NA     NA
7  1805.00 949.00 639.19 589.07 576.09 440.06
8   952.00 588.08 574.14 550.04     NA     NA
9  1718.00 576.09 425.01     NA     NA     NA
10 1708.00 577.05 425.01     NA     NA     NA

如果数据集足够小，则每次输出simMatGencpp()函数都会产生相同的结果，但是，当数据集变大时，值将在运行之间开始变化。
我尝试过在单个元素上独立运行diceNumcpp()，diceDenomcpp()和diceCoefcpp()函数，并且每次均获得一致的预期输出。一旦我使用了simMatGencpp()，那么输出就会再次变得困惑。因此，我尝试如下循环每个单独的函数。
例:

for(i in 1:100){
  print(diceNumcpp(sampleData[7,], sampleData[3,]))
}

上面的预期输出应该是3，但有时是4。每次运行此循环时，无论输出4是何时变化，有时是第二次迭代，有时是第14次迭代，或者根本没有迭代，或连续3次。
我的第一个想法是，也许由于垃圾回收并没有在c++中完全发生，所以以前运行的函数调用可能会将旧的 vector 留在内存中，因为输出对象的名称在运行之间并没有改变。但是 this post说，当函数退出时，在函数调用范围内创建的任何对象也将被销毁。
当我仅使用R代码编写相同的解决方案时，运行时很烂，但是每次它将始终返回具有相同值的矩阵或示例 vector 。
我很茫然。任何人在这个问题上可能遇到的任何帮助或光明，将不胜感激!
谢谢你的帮助。
更新2020-08-19
我希望这将有助于为那些更精通c++的人提供一些见识，以便您可能对正在发生的事情有更多的了解。我有一些示例数据，类似于上面显示的内容，该数据长187行，这意味着这些数据的相似性矩阵将包含17578个元素。我一直在使用此代码和示例数据在此解决方案的R版本和此解决方案的c++版本之间进行比较:

# create the similarity matrix with the R-solution to compare iteratively
# with another R-solution similarity matrix
simMat1 <- simMatGen(isoMat)
resultsR <- c()
for(i in 1:100){
  
  simMat2 <- simMatGen(isoMat)

  # check for any mis-matched elements in each matrix
  resultsR[[i]]<-length(which(simMat1 == simMat2)==TRUE)

  #######################################################################
  # everytime this runs I get the expected number of true values 17578
  # and check this by subtracting the mean(resultsR) from the expected 
  # number of true values of 17578 
}

mean(resultsR)

现在，当我使用C++版本执行相同的过程时，事情将急剧而迅速地改变。我只是在64位和32位R-3.6.0上进行了尝试。

simMat1 <- simMatGen(isoMat)
isoMat <- as.matrix(isoMat)
resultscpp <- c()
for(i in 1:10000){
  
  simMat2 <- simMatGencpp(isoMat)
  resultscpp[[i]]<-length(which(simMat1 == simMat2)==TRUE)

  ############  64 bit R  ##############
  # first iteration length(which(simMat1 == simMat2)==TRUE)-17578 equals 2
  # second iteration 740 elements differ: length(which(simMat1 == simMat2)==TRUE)-17578 equals 740 
  # third iteration 1142 elements differ
  # after 100 iterations the average difference is 2487.7 elements
  # after 10000 iterations the average difference is 2625.91 elements
  
  ############  32 bit R  ##############
  # first iteration difference = 1
  # second iteration difference = 694
  # 100 iterations difference = 2520.94
  # 10000 iterations difference = 2665.04
}

mean(resultscpp)

这是sessionInfo()

R version 3.6.0 (2019-04-26)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5        rstudioapi_0.10   magrittr_1.5      usethis_1.5.0     devtools_2.1.0    pkgload_1.0.2     R6_2.4.0          rlang_0.4.4      
 [9] tools_3.6.0       pkgbuild_1.0.3    sessioninfo_1.1.1 cli_1.1.0         withr_2.1.2       remotes_2.1.0     assertthat_0.2.1  digest_0.6.20    
[17] rprojroot_1.3-2   crayon_1.3.4      processx_3.3.1    callr_3.2.0       fs_1.3.1          ps_1.3.0          testthat_2.3.1    memoise_1.1.0    
[25] glue_1.3.1        compiler_3.6.0    desc_1.2.0        backports_1.1.5   prettyunits_1.0.2

最佳答案

在这里犯了一个菜鸟C++错误。
在diceNumcpp()中，我没有进行任何检查，以免意外引用数组中的越界元素。

// if a suitable match is found, increment the match variable
    if((fabs(is1[n]-is2[m])/is1[n])<0.01 && (fabs(is1[n]-is2[m])/is2[m])<0.01){
      match++;
}

更改为:

// if a suitable match is found, increment the match variable
    if(n<=(is1.size()-1) && (m<=is2.size()-1)){ // <- here need to make sure it stays inbounds 
     if((fabs(is1[n]-is2[m])/is1[n])<0.01 && (fabs(is1[n]-is2[m])/is2[m])<0.01){
       match++;
     }
    }

运行1000次后，每次都能获得正确的结果。
每天学些新东西。
干杯。

关于c++ - 具有不同值的Rcpp函数填充矩阵，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/63460134/

文章推荐： c++ - Cerrno在Xcode(11.6)中不起作用，但strerror起作用

文章推荐： string - 如何连接不带空格？

文章推荐： c++ - 查找点位于矩形对角线分割的矩形内的哪个扇区

rcpp - 如何在 Rcpp 的函数参数中将默认值设置为 Rcpp::Function？
我想将函数参数中的默认值设置为 Rcpp::Function 参数。只是简单的赋值，Rcpp::Function func = mean ，不可能。它返回错误:no viable conversi
rcpp - Rcpp 中的逐元素矩阵乘法
我正在处理需要逐元素矩阵乘法的代码。我试图在 Rcpp 中实现这一点，因为代码需要一些昂贵的循环。我对 Rcpp 还很陌生，可能会遗漏一些东西，但我无法使逐元素矩阵乘法工作。 // [[Rcpp::e
rcpp - 在 Rcpp 中声明一个变量作为引用
在 C++ 中，我们可以声明一个变量作为引用。 int a = 10; int& b = a; 如果我们设置 b=15 ， a 也会改变。我想在 Rcpp 中做类似的事情。 List X = obj
rcpp - 在 Rcpp 函数中将类作为参数传递
我正在阅读很棒的 Rcpp vignette关于使用 Rcpp 模块公开 C++ 类和函数。在这种情况下，是否可以创建一个 Rcpp 函数，该函数具有一个类型为 Uniform 的类作为参数之一，并且
rcpp - 迭代 Rcpp 中的命名列表
我在 R 中有一个命名列表: l = list(a=1, b=2) 我想在 Rcpp 中使用这个列表，并迭代值和名称。理想情况下，它可能类似于(为简洁起见使用 C++11 格式): void prin
rcpp - 使用 Rcpp 将目标文件链接到函数的简化示例
这个问题在这里已经有了答案: Rcpp - sourceCpp - undefined symbol (2 个答案) 关闭 4 年前。我现有的 C 代码由三个文件组成:头文件(“.h”文件)、库文
c++ - Rcpp 错误 : invalid static_cast from type 'Rcpp::Vector<13, Rcpp::PreserveStorage>' to type 'int'
我目前正在为类作业编写模拟退火算法(“解决”背包问题)，并想在 Rcpp 中完成(我必须使用 R，而 Rcpp 更快)。 Rcpp 一直给我以下错误 invalid static_cast from
c++ - 在 Rcpp 和 C++ 之间转换 vector (使用 Rcpp::as 或 Rcpp::wrap)是否会创建一个新 vector 并复制元素？
根据我的理解，在 Rcpp 和 C++ 之间转换 vector 会创建新 vector ，如下所示。我的理解对吗？将 Rcpp vector 转换为 C++ vector 时，我们使用 Rcpp::
rcpp - 为 Rcpp 函数中的参数设置默认值 `NULL`
我想将参数的默认值设置为 NULL在Rcpp如果参数不是NULL，则函数并根据参数进行一些计算.这种代码的一个例子是 #include using namespace Rcpp; // [[Rcpp
rcpp - Rcpp NumericMatrix 在左/右乘以标量时的奇怪行为
任何人都可以解释以下行为吗？当声明一个新的NumericMatrix时，y，作为原始矩阵，x，乘以一个标量，c，标量/矩阵乘法的顺序很重要。如果我将左侧的标量与右侧的矩阵相乘(例如 NumericM
rcpp - 在 Rcpp 中将矩阵初始化为 NA
有一种方法可以使用 NA 值初始化数值向量，例如。 NumericVector x(10,NumericVector::get_na()) 有没有类似的方法可以将矩阵初始化为 NA 值？最佳答案这
rcpp - 从 Rcpp 中的列表中提取 data.frame
这可能是一个非常简单的问题，但我不知道哪里出了问题。我有一个传递给 Rcpp 函数的列表，该列表的第一个元素是一个 data.frame。我如何获取该 data.frame？ bar = list
rcpp - 使用 Rcpp 在 R 包中添加外部库
我正在尝试开发一个使用 Sundials 的 R 包用于求解微分方程的 C 库。为了不让用户安装库，我将库的源代码放在我的包中。我已将库中的所有头文件放入 /inst/include/sundial
c++ - 根据 Rcpp.h 为自定义类扩展 Rcpp::as
我正在研究一个同时使用 Rcpp::IntegerVector (行/列指针)和模板化 std::vector 的 Rcpp 稀疏矩阵类。基本原理是，在极大的稀疏矩阵中深度复制整数指针 vector
Rcpp:如何获取 Rcpp::Nullable NumericVector 的大小
我想将一个R函数翻译成Rcpp，一个简单的测试代码如下，但我不知道如何处理默认设置为NULL的参数。 test t=R_NilValue, Rcpp
Rcpp:如何获取 Rcpp::Nullable NumericVector 的大小
我想将一个R函数翻译成Rcpp，一个简单的测试代码如下，但我不知道如何处理默认设置为NULL的参数。 test t=R_NilValue, Rcpp
c++ - 混合 Rcpp 模块和 Rcpp::export
我想公开一个 C++ 类和一个将该类的对象作为 R 参数的函数。我必须遵循简化的示例。我使用创建了一个包 Rscript -e 'Rcpp::Rcpp.package.skeleton("soq")'
Rcpp:从 Rcpp 中的包调用 C 函数
我想用 Rcpp 编写一个 C++ 函数，它使用 hypred 包中的 C 函数，它在 CRAN here 上. 我读了using C function from other package in R
c++ - Rcpp:使用 Rcpp 数据帧时推荐的代码结构(内联)
[我在别处将其草拟为评论，但决定创建一个适当的问题...] 在 Rcpp 中使用数据帧时，就代码结构而言，目前被认为是“最佳实践”的是什么？从 R 到 C++ 代码的输入数据帧“传输”非常容易，但是如
c++ - 将 Rcpp 对象分配到 Rcpp 列表中会产生最后一个元素的拷贝
我正在尝试使用 Rcpp::CharacterMatrix 并将每一行转换为 Rcpp::List 中它自己的元素。但是，我为此编写的函数有一个奇怪的行为，即列表的每个条目都对应于矩阵的最后一行。为

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - 具有不同值的Rcpp函数填充矩阵