gpt4 book ai didi

python - 我如何过滤 Pandas 数据框并根据其他列和其他条件仅保留行

转载 作者:太空宇宙 更新时间:2023-11-03 11:40:04 24 4
gpt4 key购买 nike

下面是我作为示例的数据框:

+--------------+-------+-------------+--------------+----------+-----------+
| ID | Part | RequestFrom | QTYRequested | Location | QTYOnHand |
+--------------+-------+-------------+--------------+----------+-----------+
| PartACity 1 | PartA | City 1 | 1 | LocA | 2 |
| PartACity 2 | PartA | City 2 | 1 | LocA | 2 |
| PartACity 3 | PartA | City 3 | 1 | LocA | 2 |
| PartACity 4 | PartA | City 4 | 1 | LocA | 2 |
| PartACity 5 | PartA | City 5 | 1 | LocA | 2 |
| PartACity 6 | PartA | City 6 | 1 | LocA | 2 |
| PartACity 7 | PartA | City 7 | 1 | LocA | 2 |
| PartACity 8 | PartA | City 8 | 1 | LocA | 2 |
| PartACity 9 | PartA | City 9 | 1 | LocA | 2 |
| PartACity 10 | PartA | City 10 | 1 | LocA | 2 |
| PartACity 1 | PartA | City 1 | 1 | LocB | 3 |
| PartACity 2 | PartA | City 2 | 1 | LocB | 3 |
| PartACity 3 | PartA | City 3 | 1 | LocB | 3 |
| PartACity 4 | PartA | City 4 | 1 | LocB | 3 |
| PartACity 5 | PartA | City 5 | 1 | LocB | 3 |
| PartACity 6 | PartA | City 6 | 1 | LocB | 3 |
| PartACity 7 | PartA | City 7 | 1 | LocB | 3 |
| PartACity 8 | PartA | City 8 | 1 | LocB | 3 |
| PartACity 9 | PartA | City 9 | 1 | LocB | 3 |
| PartACity 10 | PartA | City 10 | 1 | LocB | 3 |
| PartACity 1 | PartA | City 1 | 1 | LocC | 4 |
| PartACity 2 | PartA | City 2 | 1 | LocC | 4 |
| PartACity 3 | PartA | City 3 | 1 | LocC | 4 |
| PartACity 4 | PartA | City 4 | 1 | LocC | 4 |
| PartACity 5 | PartA | City 5 | 1 | LocC | 4 |
| PartACity 6 | PartA | City 6 | 1 | LocC | 4 |
| PartACity 7 | PartA | City 7 | 1 | LocC | 4 |
| PartACity 8 | PartA | City 8 | 1 | LocC | 4 |
| PartACity 9 | PartA | City 9 | 1 | LocC | 4 |
| PartACity 10 | PartA | City 10 | 1 | LocC | 4 |
+--------------+-------+-------------+--------------+----------+-----------+

我想把上面的数据框变成这样:

+-------------+-------+-------------+--------------+----------+-----------+
| ID | Part | RequestFrom | QTYRequested | Location | QTYOnHand |
+-------------+-------+-------------+--------------+----------+-----------+
| PartACity 1 | PartA | City 1 | 1 | LocA | 2 |
| PartACity 2 | PartA | City 2 | 1 | LocA | 2 |
| PartACity 3 | PartA | City 3 | 1 | LocB | 3 |
| PartACity 4 | PartA | City 4 | 1 | LocB | 3 |
| PartACity 5 | PartA | City 5 | 1 | LocB | 3 |
| PartACity 6 | PartA | City 6 | 1 | LocC | 4 |
| PartACity 7 | PartA | City 7 | 1 | LocC | 4 |
| PartACity 8 | PartA | City 8 | 1 | LocC | 4 |
| PartACity 9 | PartA | City 9 | 1 | LocC | 4 |
+-------------+-------+-------------+--------------+----------+-----------+

如您所见,总的 QTYOnHand 为 9,但我们有 10 个未完成的 A 部分请求。

我想找到一个更好的方式来分配数量。

由于 LocA 只有两个数量的 PartA,所以我们只保留前两行。

LocB 有 3 个 PartA,接下来的 3 个分配给 LocB。

LocC有4个PartA,接下来的4个会分配给LocC。

任何帮助将不胜感激!!!

最佳答案

Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:24:40) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import pandas as pd
>>> df = pd.DataFrame({
'ID' : ['PartACity 1', 'PartACity 2', 'PartACity 3', 'PartACity 4', 'PartACity 5', 'PartACity 6', 'PartACity 7', 'PartACity 8', 'PartACity 9', 'PartACity 10', 'PartACity 1', 'PartACity 2', 'PartACity 3', 'PartACity 4', 'PartACity 5', 'PartACity 6', 'PartACity 7', 'PartACity 8', 'PartACity 9', 'PartACity 10', 'PartACity 1', 'PartACity 2', 'PartACity 3', 'PartACity 4', 'PartACity 5', 'PartACity 6', 'PartACity 7', 'PartACity 8', 'PartACity 9', 'PartACity 10'],
'Part' : ['PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA', 'PartA'],
'RequestFrom': ['City 1', 'City 2', 'City 3', 'City 4', 'City 5', 'City 6', 'City 7', 'City 8', 'City 9', 'City 10', 'City 1', 'City 2', 'City 3', 'City 4', 'City 5', 'City 6', 'City 7', 'City 8', 'City 9', 'City 10', 'City 1', 'City 2', 'City 3', 'City 4', 'City 5', 'City 6', 'City 7', 'City 8', 'City 9', 'City 10'],
'QTYRequested': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'Location': ['LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocA', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocB', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC', 'LocC'],
'QTYOnHand': [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
})
>>> print(df)
ID Location ... QTYRequested RequestFrom
0 PartACity 1 LocA ... 1 City 1
1 PartACity 2 LocA ... 1 City 2
2 PartACity 3 LocA ... 1 City 3
3 PartACity 4 LocA ... 1 City 4
4 PartACity 5 LocA ... 1 City 5
5 PartACity 6 LocA ... 1 City 6
6 PartACity 7 LocA ... 1 City 7
7 PartACity 8 LocA ... 1 City 8
8 PartACity 9 LocA ... 1 City 9
9 PartACity 10 LocA ... 1 City 10
10 PartACity 1 LocB ... 1 City 1
11 PartACity 2 LocB ... 1 City 2
12 PartACity 3 LocB ... 1 City 3
13 PartACity 4 LocB ... 1 City 4
14 PartACity 5 LocB ... 1 City 5
15 PartACity 6 LocB ... 1 City 6
16 PartACity 7 LocB ... 1 City 7
17 PartACity 8 LocB ... 1 City 8
18 PartACity 9 LocB ... 1 City 9
19 PartACity 10 LocB ... 1 City 10
20 PartACity 1 LocC ... 1 City 1
21 PartACity 2 LocC ... 1 City 2
22 PartACity 3 LocC ... 1 City 3
23 PartACity 4 LocC ... 1 City 4
24 PartACity 5 LocC ... 1 City 5
25 PartACity 6 LocC ... 1 City 6
26 PartACity 7 LocC ... 1 City 7
27 PartACity 8 LocC ... 1 City 8
28 PartACity 9 LocC ... 1 City 9
29 PartACity 10 LocC ... 1 City 10

[30 rows x 6 columns]

Duplicate df as temp_df to aggregate the quantity on hand and keep track of the quantity left for each location by creating a new column QTYLeft:

>>> temp_df = df
>>> temp_df = temp_df.groupby('Location').agg({'QTYOnHand':'first'})
>>> temp_df = temp_df.reset_index()
>>> temp_df['QTYLeft'] =temp_df['QTYOnHand']
>>> print(temp_df)
Location QTYOnHand QTYLeft
0 LocA 2 2
1 LocB 3 3
2 LocC 4 4

Group df by ID, Part, RequestFrom:

>>> df = df.groupby(['ID', 'Part', 'RequestFrom']).first()
>>> df = df.reset_index()
>>> print(df)
ID Part ... QTYOnHand QTYRequested
0 PartACity 1 PartA ... 2 1
1 PartACity 10 PartA ... 2 1
2 PartACity 2 PartA ... 2 1
3 PartACity 3 PartA ... 2 1
4 PartACity 4 PartA ... 2 1
5 PartACity 5 PartA ... 2 1
6 PartACity 6 PartA ... 2 1
7 PartACity 7 PartA ... 2 1
8 PartACity 8 PartA ... 2 1
9 PartACity 9 PartA ... 2 1

[10 rows x 6 columns]

Values in ID column are strings and thus cannot be used as an index to sort according to ascending numbers, thus we create a new temporary index called temp_index first, sort the df in ascending order, then remove said index:

>>> df = df.assign(temp_index=[int(float(i.split(' ')[-1])) for i in df['ID']])
>>> df = df.sort_values(by='temp_index')
>>> print(df)
ID Part ... QTYRequested temp_index
0 PartACity 1 PartA ... 1 1
2 PartACity 2 PartA ... 1 2
3 PartACity 3 PartA ... 1 3
4 PartACity 4 PartA ... 1 4
5 PartACity 5 PartA ... 1 5
6 PartACity 6 PartA ... 1 6
7 PartACity 7 PartA ... 1 7
8 PartACity 8 PartA ... 1 8
9 PartACity 9 PartA ... 1 9
1 PartACity 10 PartA ... 1 10

[10 rows x 7 columns]
>>> del df['temp_index']

Create a new user-defined function (UDF) and apply it to allocate the available quantity per location, with the smaller indexes being allocated first as per your question:

>>> def allocate_qty(row):
global temp_df
try:
temp_df = temp_df[(temp_df['QTYLeft'] != 0)]
avail_qty = temp_df['QTYOnHand'].values[0]
avail_location = temp_df['Location'].values[0]
temp_df['QTYLeft'].values[0] = temp_df['QTYLeft'].values[0] - row['QTYRequested']
return avail_location, avail_qty
except:
return 'Not Allocated', 0


>>> df['Location'], df['QTYOnHand'] = zip(*df.apply(allocate_qty, axis=1))
>>> print(df)
ID Part ... QTYOnHand QTYRequested
0 PartACity 1 PartA ... 2 1
2 PartACity 2 PartA ... 2 1
3 PartACity 3 PartA ... 3 1
4 PartACity 4 PartA ... 3 1
5 PartACity 5 PartA ... 3 1
6 PartACity 6 PartA ... 4 1
7 PartACity 7 PartA ... 4 1
8 PartACity 8 PartA ... 4 1
9 PartACity 9 PartA ... 4 1
1 PartACity 10 PartA ... 0 1

[10 rows x 6 columns]

Filter out rows which did not manage to be allocated the resources:

>>> df = df[(df['Location'] != 'Not Allocated')]
>>> print(df)
ID Part ... QTYOnHand QTYRequested
0 PartACity 1 PartA ... 2 1
2 PartACity 2 PartA ... 2 1
3 PartACity 3 PartA ... 3 1
4 PartACity 4 PartA ... 3 1
5 PartACity 5 PartA ... 3 1
6 PartACity 6 PartA ... 4 1
7 PartACity 7 PartA ... 4 1
8 PartACity 8 PartA ... 4 1
9 PartACity 9 PartA ... 4 1

[9 rows x 6 columns]

希望这对您有所帮助!

关于python - 我如何过滤 Pandas 数据框并根据其他列和其他条件仅保留行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51341162/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com