gpt4 book ai didi

python-3.x - 在 Pandas 数据框行中保留唯一的单词

转载 作者:行者123 更新时间:2023-12-04 02:02:30 29 4
gpt4 key购买 nike

数据框:

> df
>type(df)
pandas.core.frame.DataFrame

ID Property Type Amenities
1952043 Apartment, Villa, Apartment Park, Jogging Track, Park
1918916 Bungalow, Cottage House, Cottage, Bungalow Garden, Play Ground

如何在数据框行中只保留由“逗号”分隔的唯一?在这种情况下,它不得将“Cottage House”和“Cottage”视为相同。它必须检查数据框的所有列。所以我想要的输出应该如下所示:期望的输出:

    ID      Property Type                      Amenities
1952043 Apartment, Villa Park, Jogging Track
1918916 Bungalow, Cottage House, Cottage Garden, Play Ground

最佳答案

首先,我创建了一个函数来为给定的字符串执行您想要的操作。其次,我将此函数应用于列中的所有字符串。

import numpy as np
import pandas as pd

df = pd.DataFrame([['Apartment, Villa, Apartment',
'Park, Jogging Track, Park'],
['Bungalow, Cottage House, Cottage, Bungalow',
'Garden, Play Ground']],
columns=['Property Type', 'Amenities'])

def drop_duplicates(row):
# Split string by ', ', drop duplicates and join back.
words = row.split(', ')
return ', '.join(np.unique(words).tolist())

# drop_duplicates is applied to all rows of df.
df['Property Type'] = df['Property Type'].apply(drop_duplicates)
df['Amenities'] = df['Amenities'].apply(drop_duplicates)
print(df)

关于python-3.x - 在 Pandas 数据框行中保留唯一的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46182723/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com