gpt4 book ai didi

c++ - std::vector 差异

转载 作者:IT老高 更新时间:2023-10-28 22:20:47 31 4
gpt4 key购买 nike

如何确定 2 个 vector 的差异是什么?

我有 vector<int> v1vector<int> v2 ;

我正在寻找的是 vector<int> vDifferences仅包含仅在 v1 中的元素或 v2 .

有标准的方法吗?

最佳答案

这是完整且正确的答案。在可以使用 set_symmetric_difference 算法之前,源范围必须排序:

  using namespace std; // For brevity, don't do this in your own code...

vector<int> v1;
vector<int> v2;

// ... Populate v1 and v2

// For the set_symmetric_difference algorithm to work,
// the source ranges must be ordered!
vector<int> sortedV1(v1);
vector<int> sortedV2(v2);

sort(sortedV1.begin(),sortedV1.end());
sort(sortedV2.begin(),sortedV2.end());

// Now that we have sorted ranges (i.e., containers), find the differences
vector<int> vDifferences;

set_symmetric_difference(
sortedV1.begin(),
sortedV1.end(),
sortedV2.begin(),
sortedV2.end(),
back_inserter(vDifferences));

// ... do something with the differences

应该注意,排序是一项昂贵的操作(即 O(n log n) for common STL implementations )。特别是对于其中一个或两个容器非常大(即数百万个整数或更多)的情况,基于算法复杂性,使用哈希表的不同算法可能更可取。这是该算法的高级描述:

  1. Load each container into a hash table.
  2. If the two containers differ in size, the hash table corresponding to the smaller one will be used for traversal in Step 3. Otherwise, the first of the two hash tables will be used.
  3. Traverse the hash table chosen in Step 2, checking to see if each item is present in both hash tables. If it is, remove it from both of them. The reason that the smaller hash table is preferred for traversal is because hash table lookups are on the average O(1) regardless of container size. Therefore, the time to traverse is a linear function of n (i.e., O(n)), where n is the size of the hash table being traversed.
  4. Take the union of the remaining items in the hash tables and store the result in a difference container.

C++11 通过标准化 unordered_multiset 容器为我们提供了这种解决方案的一些功能。我还使用了 auto 关键字的新用法进行显式初始化,以使以下基于哈希表的解决方案更加简洁:

using namespace std; // For brevity, don't do this in your own code...

// The remove_common_items function template removes some and / or all of the
// items that appear in both of the multisets that are passed to it. It uses the
// items in the first multiset as the criteria for the multi-presence test.
template <typename tVal>
void remove_common_items(unordered_multiset<tVal> &ms1,
unordered_multiset<tVal> &ms2)
{
// Go through the first hash table
for (auto cims1=ms1.cbegin();cims1!=ms1.cend();)
{
// Find the current item in the second hash table
auto cims2=ms2.find(*cims1);

// Is it present?
if (cims2!=ms2.end())
{
// If so, remove it from both hash tables
cims1=ms1.erase(cims1);
ms2.erase(cims2);
}
else // If not
++cims1; // Move on to the next item
}
}

int main()
{
vector<int> v1;
vector<int> v2;

// ... Populate v1 and v2

// Create two hash tables that contain the values
// from their respective initial containers
unordered_multiset<int> ms1(v1.begin(),v1.end());
unordered_multiset<int> ms2(v2.begin(),v2.end());

// Remove common items from both containers based on the smallest
if (v1.size()<=v2.size)
remove_common_items(ms1,ms2);
else
remove_common_items(ms2,ms1);

// Create a vector of the union of the remaining items
vector<int> vDifferences(ms1.begin(),ms1.end());

vDifferences.insert(vDifferences.end(),ms2.begin(),ms2.end());

// ... do something with the differences
}

为了确定哪种解决方案更适合特定情况,分析这两种算法将是明智之举。尽管基于哈希表的解决方案在 O(n) 中,但它需要更多代码,并且每个找到的重复项(即哈希表删除)都需要做更多的工作。它还(可悲地)使用自定义差分函数而不是标准 STL 算法。

应该注意的是,两种解决方案都以与元素在原始容器中出现的顺序很可能完全不同的顺序呈现差异。通过使用哈希表解决方案的变体可以解决此问题。以下是高级描述(仅在第 4 步中与前面的解决方案不同):

  1. Load each container into a hash table.
  2. If the two containers differ in size, the smaller hash table will be used for traversal in Step 3. Otherwise, the first of the two will be used.
  3. Traverse the hash table chosen in Step 2, checking to see if each item is present in both hash tables. If it is, remove it from both of them.
  4. To form the difference container, traverse the original containers in order (i.e., the first container before the second). Look up each item from each container in its respective hash table. If it is found, the item is to be added to the difference container and removed from its hash table. Items not present in the respective hash tables will be skipped. Thus, only the items that are present in the hash tables will wind up in the difference container and their order of appearance will remain the same as it was in the original containers, because those containers dictate the order of the final traversal.

为了保持原始顺序,第 4 步变得比之前的解决方案更昂贵,尤其是在移除的商品数量较多的情况下。这是因为:

  1. 将通过在各自哈希表中的存在性测试对所有项目进行第二次测试,以确定是否有资格出现在差异容器中。
  2. 当差异容器形成时,哈希表将删除其其余项,作为第 1 项的差异测试的一部分。

关于c++ - std::vector 差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7771796/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com