c++ - Armadillo C++ : Sorting a vector in terms of two other vectors-6ren

c++ - Armadillo C++ : Sorting a vector in terms of two other vectors

转载作者：搜寻专家更新时间：2023-10-31 00:27:19

24

4

我的问题与排序练习有关，我可以在 R 中轻松(但可能很慢)进行排序练习，并且想在 C++ 中进行以加快我的代码速度。

考虑三个大小相同的 vector a、b 和 c。在 R 中，以下命令将首先根据 b 对 vector 进行排序，然后，如果出现平局，将进一步根据 c 进行排序。

a<-a[order(b,c),1]

例子:

a<-c(1,2,3,4,5)
b<-c(1,2,1,2,1)
c<-c(5,4,3,2,1)

> a[order(b,c)]
[1] 5 3 1 4 2

有没有一种有效的方法可以使用 Armadillo vector 在 C++ 中执行此操作？

最佳答案

我们可以编写以下 C++ 解决方案，我在文件 SO_answer.cpp 中:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace arma;

// [[Rcpp::export]]
vec arma_sort(vec x, vec y, vec z) {
    // Order the elements of x by sorting y and z;
    // we order by y unless there's a tie, then order by z.
    // First create a vector of indices
    uvec idx = regspace<uvec>(0, x.size() - 1);
    // Then sort that vector by the values of y and z
    std::sort(idx.begin(), idx.end(), [&](int i, int j){
        if ( y[i] == y[j] ) {
            return z[i] < z[j];
        }
        return y[i] < y[j];
    });
    // And return x in that order
    return x(idx);
}

我们所做的是利用 std::sort() 允许您根据自定义比较器进行排序这一事实。我们使用一个比较器，仅当 y 的元素相等时才比较 z 的元素；否则它会比较 y.¹ 的值然后我们可以编译文件并在 R 中测试函数:

library(Rcpp)
sourceCpp("SO_answer.cpp")

set.seed(1234)
x <- sample(1:10)
y <- sample(1:10)
z <- sample(1:10)

y[sample(1:10, 1)] <- 1 # create a tie

all.equal(x[order(y, z)], c(arma_sort(x, y, z))) # check against R
# [1] TRUE # Good

当然，我们还必须考虑这是否真的会给您带来任何性能提升，这就是您这样做的全部原因。让我们进行基准测试:

library(microbenchmark)
microbenchmark(r = x[order(y, z)],
               arma = arma_sort(x, y, z),
               times = 1e4)

Unit: microseconds
 expr    min    lq      mean median    uq      max neval cld
    r 36.040 37.23 39.386160  37.64 38.32 3316.286 10000   b
 arma  5.055  6.07  7.155676   7.00  7.53  107.230 10000  a

在我的机器上，小 vector 的速度似乎提高了大约 5-6 倍，但当你扩大规模时，这种优势并不适用:

x <- sample(1:100)
y <- sample(1:100)
z <- sample(1:100)

y[sample(1:100, 10)] <- 1 # create some ties

all.equal(x[order(y, z)], c(arma_sort(x, y, z))) # check against R
# [1] TRUE # Good

microbenchmark(r = x[order(y, z)],
               arma = arma_sort(x, y, z),
               times = 1e4)

Unit: microseconds
 expr   min     lq     mean median     uq      max neval cld
    r 44.50 46.360 48.01275 46.930 47.755  294.051 10000   b
 arma 10.76 12.045 16.30033 13.015 13.715 5262.132 10000  a 

x <- sample(1:1000)
y <- sample(1:1000)
z <- sample(1:1000)

y[sample(1:100, 10)] <- 1 # create some ties

all.equal(x[order(y, z)], c(arma_sort(x, y, z))) # check against R
# [1] TRUE # Good

microbenchmark(r = x[order(y, z)],
               arma = arma_sort(x, y, z),
               times = 1e4)

Unit: microseconds
 expr     min       lq     mean   median       uq      max neval cld
    r 113.765 118.7950 125.7387 120.5075 122.4475 3373.696 10000   b
 arma  82.690  91.3925 104.0755  95.2350  99.4325 6040.162 10000  a

它仍然更快，但是当您使用长度为 1000 的 vector 时，它的速度不到 2 倍。这可能就是为什么 F. Privé 的原因。表示此操作在 R 中应该足够快。虽然使用 Rcpp 迁移到 C++ 可以为您带来巨大的性能优势，但您获得 yield 的程度在很大程度上取决于上下文，正如 Dirk Eddelbuettel 多次提到的那样。在这里回答各种问题。

1 _{请注意，通常我建议使用 sort() 或 sort_index() 对 Armadillo vector 进行排序(请参阅 Armadillo 文档 here )。如果您尝试根据第二个 vec 的值对 vec 进行排序，您可以使用 x(arma::sort_index(y)) 正如我在相关问题的回答中指出的那样 here .您甚至可以使用 stable_sort_index() 来保持平局。但是，我无法弄清楚如何使用这些功能来解决您在此处提出的具体问题。}

关于c++ - Armadillo C++ : Sorting a vector in terms of two other vectors，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49554871/

24

4

0

文章推荐： c++ - 如何设置指针内存地址？

文章推荐： c++ - 两个可以相互扩展的库

文章推荐： c++ - 如何在 C++11 的模板中只接受数字和字符串？

文章推荐： c++ - 从第一种类型的第二个非类型参数推导出第一种类型

r - R 中的错误 - model.frame.default(terms(reformulate(attributes(Terms)$term.labels)) 中的错误
我遇到了随机森林抛出错误的问题。我有这个数据框，其中包含已经采用矩阵形式的推文数据，其中包含我试图预测的情绪列。 'data.frame': 1000 obs. of 2155 variabl
emacs - 过程过滤器中的 ansi-term/multi-term 错误
在 ansi-term 和 multi-term 中，在我 cd 到一个目录后，我收到一条错误消息和乱码输出，其中 ls 的内容打印但与提示重叠。这是我得到的 $ cd /Users/crippled
emacs - 如何加载 ansi-term/multi-term 的别名和环境变量？
我是否必须设置诸如.emacs.d/init_bash.sh之类的东西(对于shell模式)，或者它可以读取我的~/.bash_profile直接地？如果是后者，如何设置配置 ansi-term/mu
sql - Oracle 给予 like %term 的优先级高于 %term%
我有一个类似这样的查询 select city_desc from mst_city where upper(city_desc) like upper('%branch%') 它填充结果以分支开头的
linq - Sitecore 搜索 : Get results term by term
在这里，我试图获取多个术语的搜索结果。说 fulltext="Lee jeans"，然后 regexresult={"lee","jeans"}。代码: IProviderSearchContext
c# - 为什么将 List 的参数重构为 IEnumerable？
我有一个看起来像这样的方法: public void UpdateTermInfo(List termInfoList) { foreach (Term termInf
tags - 热门话题 : 1-word terms vs composed terms
With your perfect help here我已经了解了如何计算热门话题(标准分数 + float 平均值)。我的下一个问题:我的数据库中的术语(由 1-3 个单词组成)与它们被提及的时间
terminology - 在Web开发中，什么是 "term"、 "taxonomy terms"和 "vocabulary"？
我需要有目的地再创建 2 个表:一个表将存储标签和类别数据(类别可以有层次结构，但标签没有)，另一个表存储标签、类别和内容之间的关系。但我对那两张 table 的名称很困惑。我确实是网络开发的新手。经
elasticsearch - span term query 和 term query 有什么区别？
我在 Elasticsearch 文档 (6.8) 中看到了关于跨度查询的部分，请求和结果类似于一些术语级别的查询，但它提到跨度是“低级别位置”，闻起来更像是偏好(也许我错了)。问题是如果我想使用
solr - 在 Solr 中使用 "terms"与 "select?qt=terms"
我在使用 Solr 4.2.0 的“/terms”请求处理程序方面遇到困难。使用 Web 浏览器，以下 url 返回 fieldName INDUSTRY 的术语列表 http://localhos
database - Elasticsearch 通过 "Partial Term"而不是 "Entire Term"进行聚合
我目前正尝试在 elasticsearch 中做一些有趣的事情……而且它几乎可以工作。用例:我必须将每个特定字段的结果数限制为 (x) 个结果。示例:在餐厅的结果集中，我只想为每个餐厅名称返回两个
emacs - 可以在 term-char-mode 和 term-line-mode 子模式之间切换的功能？
在 term.el ，我们可以从一种子模式更改为另一种子模式。但是，有没有办法用一个功能(和一个键绑定(bind))在它们之间切换？另一个问题:有没有办法在term-char-mode 中用键盘标记
azure-data-explorer - 如果 Term 到达某个点，Kusto row_cumsum 修改 Term
我有一个按以下顺序列出的员工姓名和薪水列表我需要按以下格式创建输出表。即，每当累计工资总额超过 3000 时，我必须检测到并标记该行。我试图做 row_cumsum 并在它超过 3000 时重置
bash - 如果使用 bash 以外的任何东西，我怎么能有 term.el (ansi-term) 跟踪目录
当使用 eshell 或 ansi-term 和 bash emacs 时，会根据您所在的目录更改默认目录变量。因此，如果我移动到 /home/user/code/project，然后使用 ido-
r - 无法绘制 svm 图。 terms.default(x) 错误 : no terms component nor attribute
我可以在我的数据集上使用 R 包“e1071”运行 svm，但我无法使用任何两个预测变量绘制图形。即使谷歌搜索了很多，我也无法找到它的解决方案。请高手帮我解决这个问题: 我有一个具有以下属性的数据集:
perl - 为什么在调用 new Term::ReadLine 时，perl 调试器会显示 Term::ReadLine::Stub::new？
Term::ReadLine::Stub::new(/usr/lib/perl5/5.8.8/Term/ReadLine.pm:243): 我期待看到Term::ReadLine::new ，这通常是
c# - Linq 实体 (EF 4.1) : How to do a SQL LIKE with a wildcard in the middle ( '%term%term%' )?
我要搜索这个: Post Cereal 得到这个: Post Honey Nut Cereal 通配符是空格。我知道我可以执行 SPLIT 和一系列 AND 和 Contains() 并将每个术语作
elasticsearch - `terms` 聚合的条件
我想根据聚合数据在换句话说过滤数据中放置一个条件。目前，我有一个查询 GET sense/_search { "size": 0, "aggs": { "dates": {
Java:解析字符串并将其分解为 'terms'
我在一个围绕化学式的项目上遇到了麻烦。我有两个类(class)，Term 和 Formula。 Term 接收诸如“H”或“C2”之类的输入 - 只有一个字母和任意数量的后续数字。它的字段是 elem
c++ - Term 不计算为采用三个参数的函数
我尝试使用 boost::numerics::odeint 来积分微分方程组。我的代码是 dMat Equation::solveODE(dMat u_i, double t_i) { dVec u_

首页

博学

6Ren·AI

商城

c++ - Armadillo C++ : Sorting a vector in terms of two other vectors