gpt4 book ai didi

python - csvjoin 多列

转载 作者:行者123 更新时间:2023-12-01 04:09:23 27 4
gpt4 key购买 nike

我有以下 csv 文件,我想要内部联接

CSV 1:Trip_Data.csv (250MB)

head -2 rand_trip_data_1.csv 

medallion,hack_license,vendor_id,rate_code,store_and_fwd_flag,pickup_datetime,dropoff_datetime,passenger_count,trip_time_in_secs,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude
DFD2202EE08F7A8DC9A57B02ACB81FE2,51EE87E3205C985EF8431D850C786310,CMT,1,N,2013-01-07 23:54:15,2013-01-07 23:58:20,2,244,.70,-73.974602,40.759945,-73.984734,40.759388

CSV 2:Trip_Fare (1.70GB)

head -2 trip_fare_1.csv

medallion, hack_license, vendor_id, pickup_datetime, payment_type, fare_amount, surcharge, mta_tax, tip_amount, tolls_amount, total_amount
89D227B655E5C82AECF13C3F540D4CF4,BA96DE419E711691B9445D6A6307C170,CMT,2013-01-01 15:11:48,CSH,6.5,0,0.5,0,0,7

我想合并以下列上的两个 CSV 文件:medallionhack_licensepickup_datetime

我正在使用 csvjoin,但它只允许我加入每个 csv 文件中的一列。有没有办法,我可以在连接条件中添加更多列。

仅使用 csvjoin 查询加入 medallion:

csvjoin -c medallion rand_trip_data_1.csv trip_fare_1.csv > trip_data_1.csv

bash中查询(但它不起作用)

join -t , -1 1,2,6 -2 1,2,4 rand_trip_data_1.csv trip_fare_1.csv > trip_data_1.csv
join: illegal field number -- 1,2,6

我也愿意接受替代的 bash/python 建议。谢谢!

最佳答案

我用了pandas来解决我的问题。

import pandas as pd

data = pd.read_csv("test_rand.csv")
fare = pd.read_csv("test_fare.csv")

merged = pd.merge(data, fare, how='left', on=['medallion', 'hack_license', 'pickup_datetime'])
merged.to_csv("merged.csv", index=False)

关于python - csvjoin 多列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35160251/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com