gpt4 book ai didi

Python+ Pandas +数据框+CSV : Code removes all rows from a dataframe instead of specified ones

转载 作者:太空宇宙 更新时间:2023-11-04 00:10:23 25 4
gpt4 key购买 nike

我编写了一个代码来删除所有在 category_id 列中具有 NaN 的行,该代码成功删除了在 category_id 列中具有 NaN 的行:

   #removal of rows in dataframe that have NaN values in 'category_id' column

#data = data[np.isfinite(data['category_id'])]
data = data[data['category_id'].notnull()]

print(data['category_id'].shape)
data.to_csv('dataset.csv', encoding='utf-8', index=False)
print(type(data['category_id']))

输出:

(778,)
<class 'pandas.core.series.Series'>

接下来,我编写了一段代码来保留所有仅具有列表中指定值的行:

#selecting rows of the dataset whose 'category' column has values mentioned in a list


category_ids = [19, 22, 2, 30, 23]
data = data[data.category_id.isin(category_ids)]
print(data.shape)

data.to_csv('dataset.csv', encoding='utf-8', index=False)

输出:

(0, 164)

因此,它会生成空数据框和 CSV。为什么?

最佳答案

问题是您的数据是字符串,而不是 category_id 列中的整数。

print (data.category_id.dtype)
object

因此需要将列表中的值转换为字符串:

category_ids = ['19', '22', '2', '30', '23']
data = data[data.category_id.isin(category_ids)]

或通过 Series.astype 将列转换为整数:

category_ids = [19, 22, 2, 30, 23]
data = data[data.category_id.astype(int).isin(category_ids)]

关于Python+ Pandas +数据框+CSV : Code removes all rows from a dataframe instead of specified ones,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52664730/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com