gpt4 book ai didi

string - 如何在字符串单元格中查找公共(public)元素?

转载 作者:太空宇宙 更新时间:2023-11-03 20:17:52 25 4
gpt4 key购买 nike

我想在多个 (>=2) 个字符串元胞数组中找到共同元素。

一个相关的问题是here ,答案建议使用函数 intersect(),但它仅适用于 2 个输入。

在我的例子中,我有两个以上的单元格,我想获得一个公共(public)子集。这是我想要实现的示例:

c1 = {'a','b','c','d'}
c2 = {'b','c','d'}
c3 = {'c','d'}
c_common = my_fun({c1,c2,c3});

最后,我想要 c_common={'c','d'},因为只有这两个字符串出现在所有输入中。

我如何使用 MATLAB 执行此操作?

提前致谢

P.S. 我还需要每个输入的索引,但我可能可以使用输出 c_common 自己完成,因此在答案中不是必需的。但如果有人也想解决这个问题,我的实际输出将是这样的:

[c_common, indices] = my_fun({c1,c2,c3});

对于这种情况,indices = {[3,4], [2,3], [1,2]}

谢谢,

最佳答案

这篇文章中列出的是一种矢量化方法,使用 uniqueaccumarray 为我们提供常用字符串和索引。即使字符串未在每个元胞数组中排序以向我们提供与其在其中的位置相对应的索引,这也会起作用,但它们必须是唯一的。请查看示例输入、输出部分*以查看此类案例运行。这是实现-

C = {c1,c2,c3};  % Add more cell arrays here

% Get unique strings and ID each of the strings based on their uniqueness
[unqC,~,unqID] = unique([C{:}]);

% Get count of each ID and the IDs that have counts equal to the number of
% cells arrays in C indicate that they are present in all cell arrays and
% thus are the ones to be finally selected
match_ID = find(accumarray(unqID(:),1)==numel(C));
common_str = unqC(match_ID)

% ------------ Additional work to get indices ----------------

N_str = numel(common_str);

% Store matches as a logical array to be used at later stages
matches = ismember(unqID,match_ID);

% Use ismember to find all those indices in unqID and subtract group
% lengths from them to give us the indices within each cell array
clens = [0 cumsum(cellfun('length',C(1:end-1)))];
match_index = reshape(find(matches),N_str,[]);

% Sort match_index along each column based on the respective unqID elements
[m,n] = size(match_index);
[~,sidx] = sort(reshape(unqID(matches),N_str,[]),1);
sorted_match_index = match_index(bsxfun(@plus,sidx,(0:n-1)*m));

% Subtract cumulative group lens to give us indices corres. to each cell array
common_idx = bsxfun(@minus,sorted_match_index,clens).'

请注意,在计算 match_ID 的步骤中:accumarray(unqID(:),1) 可以替换为 histc(unqID,1:max (unqID))。此外,histcounts 是另一种选择。

*示例输入、输出-

c1 = 
'a' 'b' 'c' 'd'
c2 =
'b' 'c' 'a' 'd'
c3 =
'c' 'd' 'a'
common_str =
'a' 'c' 'd'
common_idx =
1 3 4
3 2 4
3 1 2

关于string - 如何在字符串单元格中查找公共(public)元素?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37911544/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com