gpt4 book ai didi

python - 当特定列中单元格的值相同时,如何合并 pandas Dataframes 中的行

转载 作者:太空宇宙 更新时间:2023-11-03 11:13:45 25 4
gpt4 key购买 nike

我有一个数据框如下:

df

         KEY    NAME      ID_LOCATION                                                    _GEOM
0 61196 name1 [(u'-88.121429', u'41.887726')] [[[lon00,lat00],[lon01, lat01]]]
1 61197 name2 [(u'-75.161934', u'38.725163')] [[[lon10,lat10], [lon11,lat11],...]]
2 61199 name3 [(u'-88.121429', u'41.887726'), (-77.681931, 37.548851)] [[[lon20, lat20],[lon21, lat21]]]

其中 id_loc 是一个元组列表。如果存在匹配的 (lon, lat) 对,我如何按 id_loc 分组,合并这 2 行和其他列,用逗号分隔。

expected_output_df

      KEY             NAME             ID_LOCATION                                                   _GEOM
0 61196,61199 name1,name3 [(u'-85.121429', u'40.887726'), (-77.681931, 37.548851)] [[[lon00, lat00],[lon01, lat01],[lon20, lat20],[lon21, lat21]]]
1 61197 name2 [(u'-72.161934', u'35.725163')] [[[lon10,lat10], [lon11,lat11],...]]

我尝试了以下但没有成功并给我错误作为 unhashable type list:

def f(x):
return pd.Series(dict(KEY='{%s}' % ', '.join(x['KEY']),
NAME='{%s}' % ', '.join(x['NAME']),
ID_LOCATION='{%s}' % ', '.join(x['ID_LOCATION']),
_GEOM='{%s}' % ', '.join(x['_GEOM']))
)
df = df.groupby('ID_LOCATION').apply(f)

最佳答案

我认为这应该可行。

首先将事物转换为相同类型的列表(以便 sum 将事物附加在一起)。

df = pd.DataFrame(
[[['61196'], ['name1'], [('-88.121429', '41.887726')]], [['61197'], ['name2'], [('-75.161934', '38.725163')]], [['61199'], ['name3'], [('-88.121429', '41.887726'), ('-77.681931', '37.548851')]]],
columns=['KEY', 'NAME', 'id_loc']
)

然后获取行的成对组合(对于 id_loc)- 即,要合并在一起的行对。

# Loop through all pairwise combination of rows (will need index so loop over range() instead of raw values).
to_merge = [] # list of index-tuples, rows to merge together.
for i, j in itertools.combinations(range(len(df['id_loc'].values)), 2):
a = df['id_loc'].values[i]
b = df['id_loc'].values[j]

# Check for shared elemnts.
if not set(a).isdisjoint(b):
# Shared elements found.
to_merge.append([i,j])

现在处理有 3 行或更多行的情况,即 to_merge = [[1, 2], [2, 3]] 应该是 to_merge = [[1, 2 , 3]].

def find_intersection(m_list):
for i,v in enumerate(m_list) :
for j,k in enumerate(m_list[i+1:],i+1):
if v & k:
s[i]=v.union(m_list.pop(j))
return find_intersection(m_list)
return m_list

to_merge = [set(i) for i in to_merge if i]
to_merge = find_intersection(to_merge)
to_merge = [list(x) for x in to_merge]

(来自 this answer )

遍历并总结所有需要合并的行(并删除合并前的行)

for idx_list in to_merge:
df.iloc[idx_list[0], :] = df.iloc[idx_list, :].sum()
df.iloc[idx_list[1:], :] = np.nan

df = df.dropna()
df['id_loc'] = df['id_loc'].apply(lambda x: list(set(x))) # shared coords would be duped.
print(df)

关于python - 当特定列中单元格的值相同时,如何合并 pandas Dataframes 中的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55798614/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com