gpt4 book ai didi

c++ - 如何有效地将唯一对象从一个 vector 复制到另一个 vector (由相同对象的子集组成)?

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:32:02 25 4
gpt4 key购买 nike

如何有效地将对象(或一系列对象)从 vector A 复制到 vector B,

其中 vector B 已经包含与 vector A 中相同的某些对象,

以便从 vector A 复制的对象没有在 vector B 中列出?

我有一个图存储为 std::vector<MinTreeEdge>minTreeInput 中的边 vector .

我有一个根据这个图创建的最小生成树,存储在 std::vector<MinTreeEdge>minTreeOutput 中.

我正在尝试将一定数量的边随机添加回 minTreeOutput .为此,我想从 minTreeInput 复制元素回minTreeOutput直到后者包含所需数量的边。当然,复制过来的每个边缘对象都必须尚未存储 minTreeOutput .此图中不能有重复边。

以下是我到目前为止的想法。它有效,但它确实很长,而且我知道循环必须运行多次,具体取决于图形和树。我想知道如何正确执行此操作:

    // Edge class
struct MinTreeEdge
{
// For std::unique() between objects
bool operator==(MinTreeEdge const &rhs) const noexcept
{
return lhs == rhs.lhs;
}
int lhs;

int node1ID;
int node2ID;
int weight;
......
};

......

// The usage
int currentSize = minTreeOutput.size();
int targetSize = currentSize + numberOfEdgesToReturn;
int sizeDistance = targetSize - currentSize;
while(sizeDistance != 0)
{
//Probably really inefficient

for(std::vector<MinTreeEdge>::iterator it = minTreeInput.begin(); it != minTreeInput.begin()+sizeDistance; ++it)
minTreeOutput.push_back(*it);

std::vector<MinTreeEdge>::iterator mto_it;
mto_it = std::unique (minTreeOutput.begin(), minTreeOutput.end());

currentSize = minTreeOutput.size();
sizeDistance = targetSize - currentSize;
}

或者,有没有办法只列出 minTreeInput 中的所有边? (图)minTreeOutput中(树)而不必检查前者与后者中的每个单独元素?

最佳答案

How can I efficiently copy objects (or a range of objects) from vector A into vector B, where vector B already contains certain objects identical to those from vector A, so that no objects copied from vector A are already listed in vector B?

如果我正确理解了这个问题,这可以解释为“我如何创建两个 vector 的集合并集?”。

答案:使用std::set_union

set_union 其中 MinTreeEdge 复制成本低

请注意,要使其正常工作,需要对两个 vector 进行排序。正如您已经提到的那样,这是出于效率原因。

#include <vector>
#include <algorithm>
#include <cassert>
#include <iterator>

struct MinTreeEdge
{
// For std::unique() between objects
bool operator==(MinTreeEdge const &rhs) const noexcept
{
return lhs == rhs.lhs;
}
int lhs;

int node1ID;
int node2ID;
int weight;
};

struct lower_lhs
{
bool operator()(const MinTreeEdge& l, const MinTreeEdge& r) const noexcept
{
return l.lhs < r.lhs;
}
};

std::vector<MinTreeEdge> merge(std::vector<MinTreeEdge> a,
std::vector<MinTreeEdge> b)
{
// let's pessimistically assume that the inputs are not sorted
// we could simply assert that they are if the caller is aware of
// the requirement

std::sort(a.begin(), a.end(), lower_lhs());
std::sort(b.begin(), b.end(), lower_lhs());

// alternatively...
// assert(std::is_sorted(a.begin(), a.end(), lower_lhs()));
// assert(std::is_sorted(b.begin(), b.end(), lower_lhs()));

// optional step if the inputs are not already `unique`
a.erase(std::unique(a.begin(), a.end()), a.end());
b.erase(std::unique(b.begin(), b.end()), b.end());

std::vector<MinTreeEdge> result;
result.reserve(a.size() + b.size());

std::set_union(a.begin(), a.end(),
b.begin(), b.end(),
std::back_inserter(result),
lower_lhs());

return result;
}

int main()
{
// example use case

auto a = std::vector<MinTreeEdge>{};
auto b = std::vector<MinTreeEdge>{};

b = merge(std::move(a), std::move(b));
}

set_union 其中 MinTreeEdge 的复制成本很高

已经提到了集合来实现这一点。可以公平地说,如果:

  1. MinTreeEdge 复制成本高 并且,
  2. 很多

然后我们可以期望看到使用 unordered_set 的性能优势。但是,如果对象复制成本很高,那么我们可能希望通过引用将它们存储在我们的临时集中。

我可能会这样做:

// utility class which converts unary and binary operations on
// a reference_wrapper into unary and binary operations on the
// referred-to objects
template<class unary, class binary>
struct reference_as_object
{
template<class U>
decltype(auto) operator()(const std::reference_wrapper<U>& l) const {
return _unary(l.get());
}

template<class U, class V>
decltype(auto) operator()(const std::reference_wrapper<U>& l,
const std::reference_wrapper<V>& r) const {
return _binary(l.get(), r.get());
}

unary _unary;
binary _binary;
};

// utility to help prevent typos when defining a set of references
template<class K, class H, class C> using unordered_reference_set =
std::unordered_set<
std::reference_wrapper<K>,
reference_as_object<H, C>,
reference_as_object<H, C>
>;

// define unary and binary operations for our set. This way we can
// avoid polluting MinTreeEdge with artificial relational operators

struct mte_hash
{
std::size_t operator()(const MinTreeEdge& mte) const
{
return std::hash<int>()(mte.lhs);
}
};

struct mte_equal
{
bool operator()(MinTreeEdge const& l, MinTreeEdge const& r) const
{
return l.lhs == r.lhs;
}
};

// merge function. arguments by value since we will be moving
// *expensive to copy* objects out of them, and the vectors themselves
// can be *moved* into our function very cheaply

std::vector<MinTreeEdge> merge2(std::vector<MinTreeEdge> a,
std::vector<MinTreeEdge> b)
{
using temp_map_type = unordered_reference_set<MinTreeEdge, mte_hash, mte_equal>;

// build a set of references to existing objects in b
temp_map_type tmap;
tmap.reserve(b.capacity());

// b first, since the requirements mentioned 'already in B'
for (auto& ob : b) { tmap.insert(ob); }

// now add missing references in a
for (auto& oa : a) { tmap.insert(oa); }

// now build the result, moving objects from a and b as required
std::vector<MinTreeEdge> result;
result.reserve(tmap.size());

for (auto r : tmap) {
result.push_back(std::move(r.get()));
}

return result;

// a and b now have elements which are valid but in an undefined state
// The elements which are defined are the duplicates we don't need
// on summary, they are of no use to us so we drop them.
}

Trimmings - MinTreeEdge 的复制成本很高,但移动起来很便宜

假设我们想坚持使用 vector 方法(我们几乎总是应该这样做),但是 MinTreeEdge 的复制成本有点高。假设它使用 pimpl 惯用语来实现内部多态性,这将不可避免地意味着在复制时分配内存。但是,假设它可以廉价移动。我们还假设不能期望调用者在将数据发送给我们之前对数据进行排序或唯一化。

我们仍然可以使用标准算法和 vector 实现良好的效率:

std::vector<MinTreeEdge> merge(std::vector<MinTreeEdge> a,
std::vector<MinTreeEdge> b)
{
// sorts a range if not already sorted
// @return a reference to the range
auto maybe_sort = [] (auto& c) -> decltype(auto)
{
auto begin = std::begin(c);
auto end = std::end(c);
if (not std::is_sorted(begin, end, lower_lhs()))
std::sort(begin, end, lower_lhs());
return c;
};

// uniqueify a range, returning the new 'end' of
// valid data
// @pre c is sorted
// @return result of std::unique(...)
auto unique = [](auto& c) -> decltype(auto)
{
auto begin = std::begin(c);
auto end = std::end(c);
return std::unique(begin, end);
};

// turn an iterator into a move-iterator
auto mm = [](auto iter) { return std::make_move_iterator(iter); };


std::vector<MinTreeEdge> result;
result.reserve(a.size() + b.size());

// create a set_union from two input containers.
// @post a and b shall be in a valid but undefined state

std::set_union(mm(a.begin()), mm(unique(maybe_sort(a))),
mm(b.begin()), mm(unique(maybe_sort(b))),
std::back_inserter(result),
lower_lhs());

return result;
}

如果提供一个自由函数 void swap(MinTreeEdge& l, MinTreeEdge& r) nothrow 那么这个函数将需要恰好 N 次移动,其中 N 是结果集的大小。由于在 pimpl 类中,移动只是指针交换,因此该算法仍然有效。

关于c++ - 如何有效地将唯一对象从一个 vector 复制到另一个 vector (由相同对象的子集组成)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39067933/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com