gpt4 book ai didi

python - Pandas - 基于其他两列的移位值的条件计算

转载 作者:行者123 更新时间:2023-11-28 18:30:39 25 4
gpt4 key购买 nike

我确定这个问题很简单,但它已经困扰我太久了,所以真的很感激一些指导

我希望根据其他两列的结果向数据框添加一列

我想确定股票是否等于前一行中的股票以及日期是否等于前一行中的日期。

我正在寻找运行计数我尝试了以下内容

df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() ,  1, 0)

df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() &    df['trade_date']==df['trade_date'].shift(),1,0)

示例输入

Stock, Date, Time, Price 
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92

并输出:

Stock, Date,Time, Price, DayCount 
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2

我遇到了这样的错误

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

然后应用累积计数

首先,这对我来说是最重要的,您如何编写初始语句以便您可以对多列进行比较

其次,您将如何添加累计计数

非常感谢你的帮助

在原来的帖子上展开,这是另一个问题..现在假设数据集略有不同

Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer

我们正在查看连续多少次股票以买价或卖价交易,因此输出将是:

Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2

分组实际上是股票和日期,时间仅用于确定顺序..在此扩展中非常感谢任何帮助

最佳答案

UPDATE3: “我们正在查看股票连续多少次以买价或卖价交易”

In [112]: g = df.groupby(['Stock','Date'])

In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1

In [114]: df
Out[114]:
Stock Date Time Price BidOffer Count
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 1
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2

更新 2:

In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1

In [516]: df
Out[516]:
Stock Date Time Price BidOffer DayCount
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 2
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2

更新:

In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1

In [490]: df
Out[490]:
Stock Datetime Price DayCount
0 IBM 2014-09-01 12:30:01 50.50 1
1 IBM 2014-09-01 12:30:02 50.70 2
2 IBM 2014-09-01 12:30:03 50.90 3
3 IBM 2014-09-02 09:57:02 52.70 1
4 IBM 2014-09-02 09:57:03 52.90 2
5 AAPL 2014-11-02 09:57:02 520.31 1
6 AAPL 2014-11-02 09:57:03 520.92 2

原始问题的答案:

df['DayCount']=np.where(
(df['ticker']==df['ticker'].shift())
&
(df['trade_date']==df['trade_date'].shift()),
1,
0
)

第二个解决方案中唯一缺少的是括号:np.where( (...) & (...), 1, 0)

关于python - Pandas - 基于其他两列的移位值的条件计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37764741/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com