gpt4 book ai didi

python - pandas Series (pandas.Series.query()) 是否有查询方法或类似方法?

转载 作者:太空狗 更新时间:2023-10-29 16:56:33 26 4
gpt4 key购买 nike

pandas.DataFrame.query() 方法非常适合在加载或绘图时(预/后)过滤数据。它对于方法链特别方便。

我发现自己经常想将相同的逻辑应用于 pandas.Series,例如在完成诸如返回 pandas.Seriesdf.value_counts 之类的方法之后。

例子

假设有一个巨大的表格,其中包含 Player、Game、Points 列,我想绘制得分超过 14 乘以 3 的球员的直方图。我首先必须对每个玩家的分数求和 (groupby -> agg),这将返回一个由 ~1000 名玩家组成的系列赛及其总分。应用 .query 逻辑,它看起来像这样:

df = pd.DataFrame({
'Points': [random.choice([1,3]) for x in range(100)],
'Player': [random.choice(["A","B","C"]) for x in range(100)]})

(df
.query("Points == 3")
.Player.values_count()
.query("> 14")
.hist())

我找到的唯一解决方案迫使我做一个不必要的分配并打破方法链:

(points_series = df
.query("Points == 3")
.groupby("Player").size()
points_series[points_series > 100].hist()

方法链和查询方法有助于保持代码清晰,同时子集过滤很快就会变得困惑。

# just to make my point :)
series_bestplayers_under_100[series_prefiltered_under_100 > 0].shape

请帮我走出困境!谢谢

最佳答案

如果我理解正确,你可以添加query("Points > 100"):

df = pd.DataFrame({'Points':[50,20,38,90,0, np.Inf],
'Player':['a','a','a','s','s','s']})

print (df)
Player Points
0 a 50.000000
1 a 20.000000
2 a 38.000000
3 s 90.000000
4 s 0.000000
5 s inf

points_series = df.query("Points < inf").groupby("Player").agg({"Points": "sum"})['Points']
print (points_series)
a = points_series[points_series > 100]
print (a)
Player
a 108.0
Name: Points, dtype: float64


points_series = df.query("Points < inf")
.groupby("Player")
.agg({"Points": "sum"})
.query("Points > 100")

print (points_series)
Points
Player
a 108.0

另一种解决方案是 Selection By Callable :

points_series = df.query("Points < inf")
.groupby("Player")
.agg({"Points": "sum"})['Points']
.loc[lambda x: x > 100]

print (points_series)
Player
a 108.0
Name: Points, dtype: float64

编辑问题的编辑答案:

np.random.seed(1234)
df = pd.DataFrame({
'Points': [np.random.choice([1,3]) for x in range(100)],
'Player': [np.random.choice(["A","B","C"]) for x in range(100)]})

print (df.query("Points == 3").Player.value_counts().loc[lambda x: x > 15])
C 19
B 16
Name: Player, dtype: int64

print (df.query("Points == 3").groupby("Player").size().loc[lambda x: x > 15])
Player
B 16
C 19
dtype: int64

关于python - pandas Series (pandas.Series.query()) 是否有查询方法或类似方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40171498/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com