gpt4 book ai didi

python - 使用多级列有条件地更改 Pandas DF 中的值

转载 作者:行者123 更新时间:2023-11-30 22:57:20 24 4
gpt4 key购买 nike

给定以下具有多级列的 DF:

arrays = [['foo', 'foo', 'bar', 'bar'],
['A', 'B', 'C', 'D']]
tuples = list(zip(*arrays))
columnValues = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(np.random.rand(6,4), columns = columnValues)
df['txt'] = 'aaa'
print(df)

产量:

        foo                 bar            txt
A B C D
0 0.080029 0.710943 0.157265 0.774827 aaa
1 0.276949 0.923369 0.550799 0.758707 aaa
2 0.416714 0.440659 0.835736 0.130818 aaa
3 0.935763 0.908967 0.502363 0.677957 aaa
4 0.191245 0.291017 0.014355 0.762976 aaa
5 0.365464 0.286350 0.450263 0.509556 aaa

问题:我如何有效更改 foo 中的值子列到100如果它们的值 < 0.5对于巨大的 DF?

<小时/>

以下作品:

In [41]: df.foo < 0.5
Out[41]:
A B
0 True False
1 True False
2 True True
3 False False
4 True True
5 True True

In [42]: df.foo[df.foo < 0.5]
Out[42]:
A B
0 0.080029 NaN
1 0.276949 NaN
2 0.416714 0.440659
3 NaN NaN
4 0.191245 0.291017
5 0.365464 0.286350

但是如果我尝试更改它会抛出的值:

In [45]: df.foo[df.foo < 0.5] = 100
C:\Users\USER\AppData\Local\Programs\Python35\Scripts\ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

如果我尝试使用定位器:

In [46]: df.foo.loc[df.foo < 0.5] = 100
...
ValueError: cannot copy sequence with size 2 to array axis with dimension 6

df.foo.loc[df.foo < 0.5, 'foo'] = 100 出现同样的错误

如果我尝试:

df.loc[df.foo < 0.5, 'foo']

我得到:

KeyError: 'None of [       A      B\n0   True  False\n1   True  False\n2   True   True\n3  False  False\n4   True   True\n5   True   True] are in the [index]' 
<小时/>

解决方案 - timeit 与 10M 行 DF 的比较:

In [19]: %timeit df.foo.applymap(lambda x: x if x >= 0.5 else 100)
1 loop, best of 3: 29.4 s per loop

In [20]: %timeit df.foo[df.foo >= 0.5].fillna(100)
1 loop, best of 3: 1.55 s per loop

约翰·高尔特:

In [21]: %timeit df.foo.where(df.foo < 0.5, 100)
1 loop, best of 3: 1.12 s per loop

B.男:

In [5]: %timeit u=df['foo'].values;u[u<.5]=100
1 loop, best of 3: 628 ms per loop

最佳答案

这是使用 where 的一种方法-- df['foo'] = df['foo'].where(df['foo'] < 0.5, 100)

In [96]: df
Out[96]:
foo bar txt
A B C D
0 0.255309 0.237892 0.491065 0.930555 aaa
1 0.859998 0.008269 0.376213 0.984806 aaa
2 0.479928 0.761266 0.993970 0.266486 aaa
3 0.078284 0.009748 0.461687 0.653085 aaa
4 0.923293 0.642398 0.629140 0.561777 aaa
5 0.936824 0.526626 0.413250 0.732074 aaa

In [97]: df['foo'] = df['foo'].where(df['foo'] < 0.5, 100)

In [98]: df
Out[98]:
foo bar txt
A B C D
0 0.255309 0.237892 0.491065 0.930555 aaa
1 100.000000 0.008269 0.376213 0.984806 aaa
2 0.479928 100.000000 0.993970 0.266486 aaa
3 0.078284 0.009748 0.461687 0.653085 aaa
4 100.000000 100.000000 0.629140 0.561777 aaa
5 100.000000 100.000000 0.413250 0.732074 aaa

关于python - 使用多级列有条件地更改 Pandas DF 中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36700207/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com