gpt4 book ai didi

python - Pandas 从两个表创建新表

转载 作者:太空宇宙 更新时间:2023-11-04 04:41:56 24 4
gpt4 key购买 nike

我必须连接两个表并创建一个包含日期的表,但我的代码太长了,我相信我完成它的方式非常长。显然,解决方案只有 22 行。有没有另一种方法和更短的方法来解决这个问题。这里是问题enter image description here

这是我的代码,我再次相信它太长了,我认为有一个更短的方法来做到这一点。

import numpy as np
import pandas as pd
import datetime

#YOUR CODE GOES HERE#

def get_month(i):
"""this function returns the number of the month based on stringinput"""
if i == "January":
return 1
elif i == "February":
return 2
elif i == "March":
return 3
elif i == "April":
return 4
elif i == "May":
return 5
elif i == "June":
return 6
elif i == "July":
return 7
elif i == "August":
return 8
elif i == "September":
return 9
elif i == "October":
return 10
elif i == "November":
return 11
elif i == "December":
return 12

def get_reformatted_date(s):
"""this function reformats a datetime object to the output we're looking for"""
return s.strftime("%d-%b-%y")


month_names = []
tab1 = pd.read_csv("data1.csv")
tab2 = pd.read_csv("data2.csv")
tab1_tweets = tab1['Tweet'].tolist()[::-1]
tab2_tweets = tab2['Tweet'].tolist()[::-1]
tab1_months = tab1['Month'].tolist()[::-1]
tab2_months = tab2['Month'].tolist()[::-1]
tab1_days = tab1['Day'].tolist()[::-1]
tab2_days = tab2['Day'].tolist()[::-1]
tab1_years = tab1['Year'].tolist()[::-1]
tab2_years = tab2['Year'].tolist()[::-1]
all_dates = []
all_tweets = []
tab1_count = 0
tab2_count = 0
for i in range(len(tab1_tweets) + len(tab2_tweets)):
if(tab1_count < len(tab1_years) and tab2_count < len(tab2_years)):
t1_date = datetime.date(tab1_years[tab1_count], tab1_months[tab1_count], tab1_days[tab1_count])
t2_date = datetime.date(tab2_years[tab2_count], get_month(tab2_months[tab2_count]), tab2_days[tab2_count])
if t1_date > t2_date:
all_dates.append(t1_date)
all_tweets.append(tab1_tweets[tab1_count])
tab1_count += 1
else:
all_dates.append(t2_date)
all_tweets.append(tab2_tweets[tab2_count])
tab2_count += 1
elif(tab2_count < len(tab2_years)):
t2_date = datetime.date(tab2_years[tab2_count], get_month(tab2_months[tab2_count]), tab2_days[tab2_count])
all_dates.append(t2_date)
all_tweets.append(tab2_tweets[tab2_count])
tab2_count += 1
else:
t1_date = datetime.date(tab1_years[tab1_count], tab1_months[tab1_count], tab1_days[tab1_count])
all_dates.append(t1_date)
all_tweets.append(tab1_tweets[tab1_count])
tab1_count += 1

table_data = {'Date': all_dates, 'Tweet': all_tweets}
df = pd.DataFrame(table_data)
df['Date'] = df['Date'].apply(get_reformatted_date)
print(df)

data1.csv

Tweet                 Month Day  Year
Hello World 6 2 2013
I want ice-cream! 7 23 2013
Friends will be friends 9 30 2017
Done with school 12 12 2017

data2.csv

Month   Day Year    Hour    Tweet
January 2 2015 12 Happy New Year
March 21 2016 7 Today is my final
May 30 2017 23 Summer is about to begin
July 15 2018 11 Ocean is still cold

最佳答案

我认为理论上您可以在一行中完成所有事情:

finaldf = (pd.concat([pd.read_csv('data1.csv',
parse_dates={'Date':['Year', 'Month', 'Day']}),
pd.read_csv('data2.csv',
parse_dates={'Date':['Year', 'Month', 'Day']})
[['Date', 'Tweet']]])
.sort_values('Date', ascending=False))

但是为了可读性,还是分成几行比较好:

df1 = pd.read_csv('data1.csv', parse_dates={'Date':['Year', 'Month','Day']})
df2 = pd.read_csv('data2.csv', parse_dates={'Date':['Year', 'Month','Day']})

finaldf = (pd.concat([df1, df2[['Date', 'Tweet']]])
.sort_values('Date', ascending=False))

我认为对于您正在尝试做的事情,要阅读的主要内容是 pandas 的 parse_dates 参数 read_csv , 和 pd.concat连接数据帧

编辑:为了获得示例输出中格式正确的日期,您可以在上面的代码之后调用它,使用 Series.dt.strftime() :

finaldf['Date'] = finaldf['Date'].dt.strftime('%d-%b-%y')

关于python - Pandas 从两个表创建新表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50440487/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com