- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有一个数据框,其中数据位于另一列中,我想从该列中获取这些日期并创建一个日期列并存储它们。这是我的示例数据。
df=[['Monday, 13 January 2020','',''],['Task 1',13588,'Jack'],['','','Address 1'],['','','City 1'],['Task 2',13589,'Ammie'],['','','Address 2'],['','','City'],['Task 3',13589,'Amanda'],['','','Address 3'],['','','City 3'],['Tuesday, 14 January 2020','',''],['Task 4',13587,'Chelsea'],['','','Address 4'],['','','City 4'],['Task 5','13586','Ibrahim'],['','','Address 5'],['','','City 5'],['Task 6',13585,'Kate'],['','','Address 6'],['','','City 6']]
df=pd.DataFrame(df)
df.columns = ['Task','ID','Supervisor']
df=df.replace(np.nan,'')
df
Task ID Supervisor
0 Monday, 13 January 2020
1 Task 1 13588 Jack
2 Address 1
3 City 1
4 Task 2 13589 Ammie
5 Address 2
6 City
7 Task 3 13589 Amanda
8 Address 3
9 City 3
10 Tuesday, 14 January 2020
11 Task 4 13587 Chelsea
12 Address 4
13 City 4
14 Task 5 13586 Ibrahim
15 Address 5
16 City 5
17 Task 6 13585 Kate
18 Address 6
19 City 6
我想得到以下输出。
Date Task ID Supervisor
0 Monday, 13 January 2020 Task 1 13588 Jack Address 1 City 1
1 Monday, 13 January 2020 Task 2 13589 Ammie Address 2 City
2 Monday, 13 January 2020 Task 3 13589 Amanda Address 3 City 3
3 Tuesday, 14 January 2020 Task 4 13587 Chelsea Address 4 City 4
4 Tuesday, 14 January 2020 Task 5 13586 Ibrahim Address 5 City 5
5 Tuesday, 14 January 2020 Task 6 13585 Kate Address 6 City 6
这是我的尝试。
def rowMerger(a,b):
try:
rule1 = lambda x: x not in ['']
u = a.loc[a.iloc[:,0].apply(rule1) & a.iloc[:,1].apply(rule1) & a.iloc[:,2].apply(rule1)].index
print(u)
findMergerindexs = list(u)
findMergerindexs.sort()
a = pd.DataFrame(a)
tabcolumns = pd.DataFrame(a.columns)
totalcolumns = len(tabcolumns)
b = pd.DataFrame(columns = list(tabcolumns))
if (len(findMergerindexs) > 0):
for m in range(len(findMergerindexs))
if not (m == (len(findMergerindexs)-1)):
startLoop = findMergerindexs[m]
endLoop = findMergerindexs[m+1]
else:
startLoop = findMergerindexs[m]
endLoop = len(a)
listValues = []
for i in range(totalcolumns):
value = ' '
for n in range(startLoop,endLoop):
value = value + ' ' + str(a.iloc[n,i])
listValues.insert(i,(value.strip()))
b = b.append(pd.Series(listValues),ignore_index = True)
else:
print("File is not having a row for merging instances - Please check the file manually for instance - ")
return b
except:
print("Error - While merging the rows")
return b
这段代码给出了下面的输出。
rowMerger(df,0)
0 1 2
0 Task 1 13588 Jack Address 1 City 1
1 Task 2 13589 Ammie Address 2 City
2 Task 3 Tuesday, 14 January 2020 13589 Amanda Address 3 City 3
3 Task 4 13587 Chelsea Address 4 City 4
4 Task 5 13586 Ibrahim Address 5 City 5
5 Task 6 13585 Kate Address 6 City 6
但问题是这段代码只会合并行。不确定如何复制所需输出中所示的各行中的日期并将其放在不同的列中。谁能帮我实现这个目标吗?
最佳答案
您可以尝试以下操作:
task_mask = df.Task.str.match("Task\s+\d")
df.assign(Task = df.Task[task_mask],
Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
.replace("", np.NaN) \
.dropna(how='all') \
.ffill() \
.groupby(["Task", "ID", "Date"]).agg({"Supervisor": lambda x: " ".join(x)}) \
.reset_index()
输出
# Task ID Date Supervisor
# 0 Task 1 13588 Monday, 13 January 2020 Jack Address 1 City 1
# 1 Task 2 13589 Monday, 13 January 2020 Ammie Address 2 City
# 2 Task 3 13589 Monday, 13 January 2020 Amanda Address 3 City 3
# 3 Task 4 13587 Tuesday, 14 January 2020 Chelsea Address 4 City 4
# 4 Task 5 13586 Tuesday, 14 January 2020 Ibrahim Address 5 City 5
# 5 Task 6 13585 Tuesday, 14 January 2020 Kate Address 6 City 6
说明:
过滤任务
列:日期
和任务ID
。
任务 ID
。 pandas.Series.str.match
做工作。使用的正则表达式非常简单:"Task\s+\d"
表示 Task
+ 任意空格 + 数字。task_mask = df.Task.str.match("Task\s+\d")
从此掩码中,我们可以提取日期
和任务
。可以使用 df.Task[task_mask]
task_mask
轻松访问这些任务
日期
提取起来要困难一些。
pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()
使用 replace
将所有空字符串替换为 NaN
使用 dropna
删除空行(例如,只有 Date
的旧行)与 how="all"
使用 ffill
用之前的非 NaN
值填充所有 NaN
值
按“任务”、“ID”a 和“日期”进行分组
并使用 agg
聚合行。聚合函数基于str.join
: lambda x: "".join(x)
使用reset_index
从groupby重置索引.
希望这是清楚的!
代码+插图
# Create dataframe
data = [['Monday, 13 January 2020', '', ''], ['Task 1', 13588, 'Jack'], ['', '', 'Address 1'], ['', '', 'City 1'], ['Task 2', 13589, 'Ammie'], ['', '', 'Address 2'], ['', '', 'City'], ['Task 3', 13589, 'Amanda'], ['', '', 'Address 3'], ['', '', 'City 3'], [
'Tuesday, 14 January 2020', '', ''], ['Task 4', 13587, 'Chelsea'], ['', '', 'Address 4'], ['', '', 'City 4'], ['Task 5', '13586', 'Ibrahim'], ['', '', 'Address 5'], ['', '', 'City 5'], ['Task 6', 13585, 'Kate'], ['', '', 'Address 6'], ['', '', 'City 6']]
df = pd.DataFrame(data)
df.columns = ['Task', 'ID', 'Supervisor']
print(df)
# Step 1
task_mask = df.Task.str.match("Task\s+\d")
print(task_mask)
# 0 False
# 1 True
# 2 False
# 3 False
# 4 True
# 5 False
# 6 False
# 7 True
# 8 False
# 9 False
# 10 False
# 11 True
# 12 False
# 13 False
# 14 True
# 15 False
# 16 False
# 17 True
# 18 False
# 19 False
# Name: Task, dtype: bool
# Step 2
print(df.Task[task_mask])
# 1 Task 1
# 4 Task 2
# 7 Task 3
# 11 Task 4
# 14 Task 5
# 17 Task 6
# Name: Task, dtype: object
# Step 3
print(pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift())
# 0 NaN
# 1 Monday, 13 January 2020
# 2 NaN
# 3
# 4
# 5 NaN
# 6
# 7
# 8 NaN
# 9
# 10
# 11 Tuesday, 14 January 2020
# 12 NaN
# 13
# 14
# 15 NaN
# 16
# 17
# 18 NaN
# 19
# dtype: object
# Step 4
print(df.assign(Task=df.Task[task_mask],
Date=pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift())
.replace("", np.NaN))
# Task ID Supervisor Date
# 0 NaN NaN NaN NaN
# 1 Task 1 13588 Jack Monday, 13 January 2020
# 2 NaN NaN Address 1 NaN
# 3 NaN NaN City 1 NaN
# 4 Task 2 13589 Ammie NaN
# 5 NaN NaN Address 2 NaN
# 6 NaN NaN City NaN
# 7 Task 3 13589 Amanda NaN
# 8 NaN NaN Address 3 NaN
# 9 NaN NaN City 3 NaN
# 10 NaN NaN NaN NaN
# 11 Task 4 13587 Chelsea Tuesday, 14 January 2020
# 12 NaN NaN Address 4 NaN
# 13 NaN NaN City 4 NaN
# 14 Task 5 13586 Ibrahim NaN
# 15 NaN NaN Address 5 NaN
# 16 NaN NaN City 5 NaN
# 17 Task 6 13585 Kate NaN
# 18 NaN NaN Address 6 NaN
# 19 NaN NaN City 6 NaN
# Step 5:
print(df.assign(Task = df.Task[task_mask],
Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
.replace("", np.NaN) \
.dropna(how='all'))
# Task ID Supervisor Date
# 1 Task 1 13588 Jack Monday, 13 January 2020
# 2 NaN NaN Address 1 NaN
# 3 NaN NaN City 1 NaN
# 4 Task 2 13589 Ammie NaN
# 5 NaN NaN Address 2 NaN
# 6 NaN NaN City NaN
# 7 Task 3 13589 Amanda NaN
# 8 NaN NaN Address 3 NaN
# 9 NaN NaN City 3 NaN
# 11 Task 4 13587 Chelsea Tuesday, 14 January 2020
# 12 NaN NaN Address 4 NaN
# 13 NaN NaN City 4 NaN
# 14 Task 5 13586 Ibrahim NaN
# 15 NaN NaN Address 5 NaN
# 16 NaN NaN City 5 NaN
# 17 Task 6 13585 Kate NaN
# 18 NaN NaN Address 6 NaN
# 19 NaN NaN City 6 NaN
# Step 6:
print(df.assign(Task = df.Task[task_mask],
Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
.replace("", np.NaN) \
.dropna(how='all') \
.ffill())
# Task ID Supervisor Date
# 1 Task 1 13588 Jack Monday, 13 January 2020
# 2 Task 1 13588 Address 1 Monday, 13 January 2020
# 3 Task 1 13588 City 1 Monday, 13 January 2020
# 4 Task 2 13589 Ammie Monday, 13 January 2020
# 5 Task 2 13589 Address 2 Monday, 13 January 2020
# 6 Task 2 13589 City Monday, 13 January 2020
# 7 Task 3 13589 Amanda Monday, 13 January 2020
# 8 Task 3 13589 Address 3 Monday, 13 January 2020
# 9 Task 3 13589 City 3 Monday, 13 January 2020
# 11 Task 4 13587 Chelsea Tuesday, 14 January 2020
# 12 Task 4 13587 Address 4 Tuesday, 14 January 2020
# 13 Task 4 13587 City 4 Tuesday, 14 January 2020
# 14 Task 5 13586 Ibrahim Tuesday, 14 January 2020
# 15 Task 5 13586 Address 5 Tuesday, 14 January 2020
# 16 Task 5 13586 City 5 Tuesday, 14 January 2020
# 17 Task 6 13585 Kate Tuesday, 14 January 2020
# 18 Task 6 13585 Address 6 Tuesday, 14 January 2020
# 19 Task 6 13585 City 6 Tuesday, 14 January 2020
# Step 7
print(df.assign(Task = df.Task[task_mask],
Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
.replace("", np.NaN) \
.dropna(how='all') \
.ffill() \
.groupby(["Task", "ID", "Date"]).agg({"Supervisor": lambda x: " ".join(x)}))
# Supervisor
# Task ID Date
# Task 1 13588 Monday, 13 January 2020 Jack Address 1 City 1
# Task 2 13589 Monday, 13 January 2020 Ammie Address 2 City
# Task 3 13589 Monday, 13 January 2020 Amanda Address 3 City 3
# Task 4 13587 Tuesday, 14 January 2020 Chelsea Address 4 City 4
# Task 5 13586 Tuesday, 14 January 2020 Ibrahim Address 5 City 5
# Task 6 13585 Tuesday, 14 January 2020 Kate Address 6 City 6
# Step 8
df = df.assign(Task = df.Task[task_mask],
Date = pd.Series(np.where(~task_mask, df["Task"], np.NaN)).shift()) \
.replace("", np.NaN) \
.dropna(how='all') \
.ffill() \
.groupby(["Task", "ID", "Date"]).agg({"Supervisor": lambda x: " ".join(x)}) \
.reset_index()
print(df)
# Task ID Date Supervisor
# 0 Task 1 13588 Monday, 13 January 2020 Jack Address 1 City 1
# 1 Task 2 13589 Monday, 13 January 2020 Ammie Address 2 City
# 2 Task 3 13589 Monday, 13 January 2020 Amanda Address 3 City 3
# 3 Task 4 13587 Tuesday, 14 January 2020 Chelsea Address 4 City 4
# 4 Task 5 13586 Tuesday, 14 January 2020 Ibrahim Address 5 City 5
# 5 Task 6 13585 Tuesday, 14 January 2020 Kate Address 6 City 6
关于python - 如何使用python重新排列数据框的行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61556939/
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库,但没有成功。 我猜它只是通过 knn 聚类
我有一个扁平数字列表,这些数字逻辑上以 3 为一组,其中每个三元组是 (number, __ignored, flag[0 or 1]),例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。 如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
这听起来像是谜语或笑话,但实际上我还没有找到这个问题的答案。 问题到底是什么? 我想运行 2 个脚本。在第一个脚本中,我调用另一个脚本,但我希望它们继续并行,而不是在两个单独的线程中。主要是我不希望第
我有一个带有 python 2.5.5 的软件。我想发送一个命令,该命令将在 python 2.7.5 中启动一个脚本,然后继续执行该脚本。 我试过用 #!python2.7.5 和http://re
我在 python 命令行(使用 python 2.7)中,并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹,使用: os.chdir("
剧透:部分解决(见最后)。 以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
假设我有以下列表,对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
所以我试图在选择某个单选按钮时更改此框架的背景。 我的框架位于一个类中,并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
我正在尝试将字符串与 python 中的正则表达式进行比较,如下所示, #!/usr/bin/env python3 import re str1 = "Expecting property name
考虑以下原型(prototype) Boost.Python 模块,该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
如何编写一个程序来“识别函数调用的行号?” python 检查模块提供了定位行号的选项,但是, def di(): return inspect.currentframe().f_back.f_l
我已经使用 macports 安装了 Python 2.7,并且由于我的 $PATH 变量,这就是我输入 $ python 时得到的变量。然而,virtualenv 默认使用 Python 2.6,除
我只想问如何加快 python 上的 re.search 速度。 我有一个很长的字符串行,长度为 176861(即带有一些符号的字母数字字符),我使用此函数测试了该行以进行研究: def getExe
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。 告
我想用 Python 将两个列表组合成一个列表,方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
学习 Python,我正在尝试制作一个没有任何第 3 方库的网络抓取工具,这样过程对我来说并没有简化,而且我知道我在做什么。我浏览了一些在线资源,但所有这些都让我对某些事情感到困惑。 html 看起来
我是一名优秀的程序员,十分优秀!