gpt4 book ai didi

python - 如何使用python重新排列数据框的行?

转载 作者:行者123 更新时间:2023-12-02 00:33:21 25 4
gpt4 key购买 nike

我有一个数据框,其中数据位于另一列中,我想从该列中获取这些日期并创建一个日期列并存储它们。这是我的示例数据。

df=[['Monday, 13 January 2020','',''],['Task 1',13588,'Jack'],['','','Address 1'],['','','City 1'],['Task 2',13589,'Ammie'],['','','Address 2'],['','','City'],['Task 3',13589,'Amanda'],['','','Address 3'],['','','City 3'],['Tuesday, 14 January 2020','',''],['Task 4',13587,'Chelsea'],['','','Address 4'],['','','City 4'],['Task 5','13586','Ibrahim'],['','','Address 5'],['','','City 5'],['Task 6',13585,'Kate'],['','','Address 6'],['','','City 6']]

df=pd.DataFrame(df)
df.columns = ['Task','ID','Supervisor']
df=df.replace(np.nan,'')
df

Task ID Supervisor
0 Monday, 13 January 2020
1 Task 1 13588 Jack
2 Address 1
3 City 1
4 Task 2 13589 Ammie
5 Address 2
6 City
7 Task 3 13589 Amanda
8 Address 3
9 City 3
10 Tuesday, 14 January 2020
11 Task 4 13587 Chelsea
12 Address 4
13 City 4
14 Task 5 13586 Ibrahim
15 Address 5
16 City 5
17 Task 6 13585 Kate
18 Address 6
19 City 6

我想得到以下输出。

    Date                    Task    ID      Supervisor
0 Monday, 13 January 2020 Task 1 13588 Jack Address 1 City 1
1 Monday, 13 January 2020 Task 2 13589 Ammie Address 2 City
2 Monday, 13 January 2020 Task 3 13589 Amanda Address 3 City 3
3 Tuesday, 14 January 2020 Task 4 13587 Chelsea Address 4 City 4
4 Tuesday, 14 January 2020 Task 5 13586 Ibrahim Address 5 City 5
5 Tuesday, 14 January 2020 Task 6 13585 Kate Address 6 City 6

这是我的尝试。

def rowMerger(a,b):
try:
rule1 = lambda x: x not in ['']
u = a.loc[a.iloc[:,0].apply(rule1) & a.iloc[:,1].apply(rule1) & a.iloc[:,2].apply(rule1)].index
print(u)
findMergerindexs = list(u)
findMergerindexs.sort()
a = pd.DataFrame(a)
tabcolumns = pd.DataFrame(a.columns)
totalcolumns = len(tabcolumns)
b = pd.DataFrame(columns = list(tabcolumns))
if (len(findMergerindexs) > 0):
for m in range(len(findMergerindexs))
if not (m == (len(findMergerindexs)-1)):
startLoop = findMergerindexs[m]
endLoop = findMergerindexs[m+1]
else:
startLoop = findMergerindexs[m]
endLoop = len(a)
listValues = []
for i in range(totalcolumns):
value = ' '
for n in range(startLoop,endLoop):
value = value + ' ' + str(a.iloc[n,i])
listValues.insert(i,(value.strip()))
b = b.append(pd.Series(listValues),ignore_index = True)
else:
print("File is not having a row for merging instances - Please check the file manually for instance - ")
return b
except:
print("Error - While merging the rows")
return b

这段代码给出了下面的输出。

rowMerger(df,0)
0 1 2
0 Task 1 13588 Jack Address 1 City 1
1 Task 2 13589 Ammie Address 2 City
2 Task 3 Tuesday, 14 January 2020 13589 Amanda Address 3 City 3
3 Task 4 13587 Chelsea Address 4 City 4
4 Task 5 13586 Ibrahim Address 5 City 5
5 Task 6 13585 Kate Address 6 City 6

但问题是这段代码只会合并行。不确定如何复制所需输出中所示的各行中的日期并将其放在不同的列中。谁能帮我实现这个目标吗?

最佳答案

您可以尝试以下操作:

task_mask = df.Task.str.match("Task\s+\d")
df.assign(Task = df.Task[task_mask],
Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
.replace("", np.NaN) \
.dropna(how='all') \
.ffill() \
.groupby(["Task", "ID", "Date"]).agg({"Supervisor": lambda x: " ".join(x)}) \
.reset_index()

输出

#      Task     ID                      Date                Supervisor
# 0 Task 1 13588 Monday, 13 January 2020 Jack Address 1 City 1
# 1 Task 2 13589 Monday, 13 January 2020 Ammie Address 2 City
# 2 Task 3 13589 Monday, 13 January 2020 Amanda Address 3 City 3
# 3 Task 4 13587 Tuesday, 14 January 2020 Chelsea Address 4 City 4
# 4 Task 5 13586 Tuesday, 14 January 2020 Ibrahim Address 5 City 5
# 5 Task 6 13585 Tuesday, 14 January 2020 Kate Address 6 City 6

说明:

  1. 过滤任务列:日期任务ID

    • 一种解决方案是使用正则表达式来匹配任务 IDpandas.Series.str.match做工作。使用的正则表达式非常简单:"Task\s+\d" 表示 Task + 任意空格 + 数字。
task_mask = df.Task.str.match("Task\s+\d")
  • 从此掩码中,我们可以提取日期任务。可以使用 df.Task[task_mask]

  • task_mask 轻松访问这些任务
  • 日期提取起来要困难一些。

    • 我们使用np.where设置 Task 值或 NaN
    • 然后,我们将此数组转换为 pd.Series
    • 最后,我们使用 shift 将所有值移动 1 。通过移动行,我们可以轻松删除第 5 步中的 NaN 行。
  • pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()
  • 使用 replace 将所有空字符串替换为 NaN

  • 使用 dropna 删除空行(例如,只有 Date 的旧行)与 how="all"

  • 使用 ffill 用之前的非 NaN 值填充所有 NaN

  • “任务”、“ID”a 和“日期”进行分组并使用 agg 聚合行。聚合函数基于str.join : lambda x: "".join(x)

  • 使用reset_indexgroupby重置索引.

  • 希望这是清楚的!


    代码+插图

    # Create dataframe
    data = [['Monday, 13 January 2020', '', ''], ['Task 1', 13588, 'Jack'], ['', '', 'Address 1'], ['', '', 'City 1'], ['Task 2', 13589, 'Ammie'], ['', '', 'Address 2'], ['', '', 'City'], ['Task 3', 13589, 'Amanda'], ['', '', 'Address 3'], ['', '', 'City 3'], [
    'Tuesday, 14 January 2020', '', ''], ['Task 4', 13587, 'Chelsea'], ['', '', 'Address 4'], ['', '', 'City 4'], ['Task 5', '13586', 'Ibrahim'], ['', '', 'Address 5'], ['', '', 'City 5'], ['Task 6', 13585, 'Kate'], ['', '', 'Address 6'], ['', '', 'City 6']]
    df = pd.DataFrame(data)
    df.columns = ['Task', 'ID', 'Supervisor']
    print(df)

    # Step 1
    task_mask = df.Task.str.match("Task\s+\d")
    print(task_mask)
    # 0 False
    # 1 True
    # 2 False
    # 3 False
    # 4 True
    # 5 False
    # 6 False
    # 7 True
    # 8 False
    # 9 False
    # 10 False
    # 11 True
    # 12 False
    # 13 False
    # 14 True
    # 15 False
    # 16 False
    # 17 True
    # 18 False
    # 19 False
    # Name: Task, dtype: bool

    # Step 2
    print(df.Task[task_mask])
    # 1 Task 1
    # 4 Task 2
    # 7 Task 3
    # 11 Task 4
    # 14 Task 5
    # 17 Task 6
    # Name: Task, dtype: object

    # Step 3
    print(pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift())
    # 0 NaN
    # 1 Monday, 13 January 2020
    # 2 NaN
    # 3
    # 4
    # 5 NaN
    # 6
    # 7
    # 8 NaN
    # 9
    # 10
    # 11 Tuesday, 14 January 2020
    # 12 NaN
    # 13
    # 14
    # 15 NaN
    # 16
    # 17
    # 18 NaN
    # 19
    # dtype: object

    # Step 4
    print(df.assign(Task=df.Task[task_mask],
    Date=pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift())
    .replace("", np.NaN))
    # Task ID Supervisor Date
    # 0 NaN NaN NaN NaN
    # 1 Task 1 13588 Jack Monday, 13 January 2020
    # 2 NaN NaN Address 1 NaN
    # 3 NaN NaN City 1 NaN
    # 4 Task 2 13589 Ammie NaN
    # 5 NaN NaN Address 2 NaN
    # 6 NaN NaN City NaN
    # 7 Task 3 13589 Amanda NaN
    # 8 NaN NaN Address 3 NaN
    # 9 NaN NaN City 3 NaN
    # 10 NaN NaN NaN NaN
    # 11 Task 4 13587 Chelsea Tuesday, 14 January 2020
    # 12 NaN NaN Address 4 NaN
    # 13 NaN NaN City 4 NaN
    # 14 Task 5 13586 Ibrahim NaN
    # 15 NaN NaN Address 5 NaN
    # 16 NaN NaN City 5 NaN
    # 17 Task 6 13585 Kate NaN
    # 18 NaN NaN Address 6 NaN
    # 19 NaN NaN City 6 NaN

    # Step 5:
    print(df.assign(Task = df.Task[task_mask],
    Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
    .replace("", np.NaN) \
    .dropna(how='all'))
    # Task ID Supervisor Date
    # 1 Task 1 13588 Jack Monday, 13 January 2020
    # 2 NaN NaN Address 1 NaN
    # 3 NaN NaN City 1 NaN
    # 4 Task 2 13589 Ammie NaN
    # 5 NaN NaN Address 2 NaN
    # 6 NaN NaN City NaN
    # 7 Task 3 13589 Amanda NaN
    # 8 NaN NaN Address 3 NaN
    # 9 NaN NaN City 3 NaN
    # 11 Task 4 13587 Chelsea Tuesday, 14 January 2020
    # 12 NaN NaN Address 4 NaN
    # 13 NaN NaN City 4 NaN
    # 14 Task 5 13586 Ibrahim NaN
    # 15 NaN NaN Address 5 NaN
    # 16 NaN NaN City 5 NaN
    # 17 Task 6 13585 Kate NaN
    # 18 NaN NaN Address 6 NaN
    # 19 NaN NaN City 6 NaN

    # Step 6:
    print(df.assign(Task = df.Task[task_mask],
    Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
    .replace("", np.NaN) \
    .dropna(how='all') \
    .ffill())
    # Task ID Supervisor Date
    # 1 Task 1 13588 Jack Monday, 13 January 2020
    # 2 Task 1 13588 Address 1 Monday, 13 January 2020
    # 3 Task 1 13588 City 1 Monday, 13 January 2020
    # 4 Task 2 13589 Ammie Monday, 13 January 2020
    # 5 Task 2 13589 Address 2 Monday, 13 January 2020
    # 6 Task 2 13589 City Monday, 13 January 2020
    # 7 Task 3 13589 Amanda Monday, 13 January 2020
    # 8 Task 3 13589 Address 3 Monday, 13 January 2020
    # 9 Task 3 13589 City 3 Monday, 13 January 2020
    # 11 Task 4 13587 Chelsea Tuesday, 14 January 2020
    # 12 Task 4 13587 Address 4 Tuesday, 14 January 2020
    # 13 Task 4 13587 City 4 Tuesday, 14 January 2020
    # 14 Task 5 13586 Ibrahim Tuesday, 14 January 2020
    # 15 Task 5 13586 Address 5 Tuesday, 14 January 2020
    # 16 Task 5 13586 City 5 Tuesday, 14 January 2020
    # 17 Task 6 13585 Kate Tuesday, 14 January 2020
    # 18 Task 6 13585 Address 6 Tuesday, 14 January 2020
    # 19 Task 6 13585 City 6 Tuesday, 14 January 2020

    # Step 7
    print(df.assign(Task = df.Task[task_mask],
    Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
    .replace("", np.NaN) \
    .dropna(how='all') \
    .ffill() \
    .groupby(["Task", "ID", "Date"]).agg({"Supervisor": lambda x: " ".join(x)}))
    # Supervisor
    # Task ID Date
    # Task 1 13588 Monday, 13 January 2020 Jack Address 1 City 1
    # Task 2 13589 Monday, 13 January 2020 Ammie Address 2 City
    # Task 3 13589 Monday, 13 January 2020 Amanda Address 3 City 3
    # Task 4 13587 Tuesday, 14 January 2020 Chelsea Address 4 City 4
    # Task 5 13586 Tuesday, 14 January 2020 Ibrahim Address 5 City 5
    # Task 6 13585 Tuesday, 14 January 2020 Kate Address 6 City 6

    # Step 8
    df = df.assign(Task = df.Task[task_mask],
    Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
    .replace("", np.NaN) \
    .dropna(how='all') \
    .ffill() \
    .groupby(["Task", "ID", "Date"]).agg({"Supervisor": lambda x: " ".join(x)}) \
    .reset_index()
    print(df)

    # Task ID Date Supervisor
    # 0 Task 1 13588 Monday, 13 January 2020 Jack Address 1 City 1
    # 1 Task 2 13589 Monday, 13 January 2020 Ammie Address 2 City
    # 2 Task 3 13589 Monday, 13 January 2020 Amanda Address 3 City 3
    # 3 Task 4 13587 Tuesday, 14 January 2020 Chelsea Address 4 City 4
    # 4 Task 5 13586 Tuesday, 14 January 2020 Ibrahim Address 5 City 5
    # 5 Task 6 13585 Tuesday, 14 January 2020 Kate Address 6 City 6

    关于python - 如何使用python重新排列数据框的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61556939/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com