gpt4 book ai didi

Python - Pandas 索引和选择

转载 作者:太空宇宙 更新时间:2023-11-03 15:06:23 24 4
gpt4 key购买 nike

我试图让 pandas 从下面的结构化 csv 中选择“ClosePrice”下的行范围并将其存储在数据框中。该文件有很多标识符,但我只想使用下面列表中的标识符浏览该文件。而且行数并不总是相同。

list = ['ABC0123', 'DEF0123']

> Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7
> "Date" 20170101 "Identifier" ABC0123
> "OpenPrice" 500 "Currency" USD
> "ClosePrice" 550 "foo" bar
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> "Date" 20170101 "Identifier" SOMEOTHER
> ...
> ...
> ...
> "Date" 20170101 "Identifier" DEF0123
> "OpenPrice" 600 "Currency" USD
> "ClosePrice" 650 "foo" bar
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo
> foo foo foo foo foo foo foo

我通过 for-i-loop 获取我感兴趣的每个表的第一行,并且:

df.iloc[df[df['Column 4'].isin(list)].index + 3,:]

它会转到带有“foo”值的左上角单元格并选择整行,但我试图弄清楚如何选择该起点下方的行并在下一个之前停止

"Date"   20170101 "Identifier"   SOMEOTHER

我正在考虑的一种方法是检查第 5 列最后一行下的单元格值的 len,该值将为 = 0 ,但我无法通过脚本重现此逻辑。我们非常欢迎其他方法。

最佳答案

首先不要使用list作为变量,因为masking内置函数。

创建辅助列g,用于区分具有唯一编号的所有组 cumsum 。然后获取包含 L 值的所有组,并通过另一个 isin 选择所有行:

L = ['ABC0123', 'DEF0123']
df['g'] = df['Column 1'].eq('Date').cumsum()
vals = df.loc[df['Column 4'].isin(L), 'g']
df = df[df['g'].isin(vals)]
print (df)
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 g
0 Date 20170101 Identifier ABC0123 NaN NaN NaN 1
1 OpenPrice 500 Currency USD NaN NaN NaN 1
2 ClosePrice 550 foo bar NaN NaN NaN 1
3 foo foo foo foo foo foo foo 1
4 foo foo foo foo foo foo foo 1
5 foo foo foo foo foo foo foo 1
9 Date 20170101 Identifier DEF0123 NaN NaN NaN 3
10 OpenPrice 600 Currency USD NaN NaN NaN 3
11 ClosePrice 650 foo bar NaN NaN NaN 3
12 foo foo foo foo foo foo foo 3
13 foo foo foo foo foo foo foo 3

最后(如有必要)删除g列:

df = df.drop('g', axis=1)

使用index的类似解决方案:

L = ['ABC0123', 'DEF0123']
df.index = df['Column 1'].eq('Date').cumsum()
vals = df.index[df['Column 4'].isin(L)]
df = df.loc[vals].reset_index(drop=True)
print (df)
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7
0 Date 20170101 Identifier ABC0123 NaN NaN NaN
1 OpenPrice 500 Currency USD NaN NaN NaN
2 ClosePrice 550 foo bar NaN NaN NaN
3 foo foo foo foo foo foo foo
4 foo foo foo foo foo foo foo
5 foo foo foo foo foo foo foo
6 Date 20170101 Identifier DEF0123 NaN NaN NaN
7 OpenPrice 600 Currency USD NaN NaN NaN
8 ClosePrice 650 foo bar NaN NaN NaN
9 foo foo foo foo foo foo foo
10 foo foo foo foo foo foo foo

编辑:

L1 = ['Date','OpenPrice','ClosePrice']
L = ['ABC0123', 'DEF0123']

#if necessary filter rows by L1
df = df[df['Column 1'].isin(L1)]
df['g'] = df['Column 1'].eq('Date').cumsum()
vals = df.loc[df['Column 4'].isin(L), 'g']
df = df[df['g'].isin(vals)]
print (df)
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 g
0 Date 20170101 Identifier ABC0123 NaN NaN NaN 1
1 OpenPrice 500 Currency USD NaN NaN NaN 1
2 ClosePrice 550 foo bar NaN NaN NaN 1
9 Date 20170101 Identifier DEF0123 NaN NaN NaN 3
10 OpenPrice 600 Currency USD NaN NaN NaN 3
11 ClosePrice 650 foo bar NaN NaN NaN 3

对于小组工作,可以使用 groupbyflexible apply

def f(x):
print (x)
#some another code
return x

df1 = df.groupby('g').apply(f)
print (df1)

编辑:

Final code使用真实数据:

 L1 = ["Date", "OpenPrice", "ClosePrice"] 
g = 1
for i in list:
df['g'] = df['Column 4'].isin(list).cumsum()
vals = df.loc[df['Column 4'].isin(list), 'g']
df = df[df['g'].isin(vals)]
dfFinal = df.loc[(dfLux['g'] == g) & ~df['Column 1'].isin(L1)]
g=g+1

关于Python - Pandas 索引和选择,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44617966/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com