gpt4 book ai didi

Python Pandas - use Multiple Character Delimiter when writing to_csv(Python Pandas-写入_csv时使用多个字符分隔符)

转载 作者:bug小助手 更新时间:2023-10-26 20:22:16 24 4
gpt4 key购买 nike



It appears that the pandas to_csv function only allows single character delimiters/separators.

PANDA TO_CSV函数似乎只允许使用单个字符分隔符/分隔符。



Is there some way to allow for a string of characters to be used like, "::" or "%%" instead?

是否有某种方法允许使用字符串,如“::”或“%%”?



I tried:

我试过:



df.to_csv(local_file,  sep = '::', header=None, index=False)


and getting:

并获得:



TypeError: "delimiter" must be a 1-character string

更多回答

You could append to each element a single character of your desired separator and then pass a single character for the delimeter, but if you intend to read this back into pandas then you will encounter the same difficulty

您可以将所需分隔符的单个字符附加到每个元素,然后传递一个用于分隔符的字符,但如果您打算将其读回PANDA,则会遇到同样的困难

@EdChum Good idea.. What would be a command to append a single character to each field in DF (it has 100 columns and 10000 rows). I am guessing the last column must not have trailing character (because is last). Thanks!

@EdChum好主意..向df中的每个字段追加单个字符的命令(它有100列和10000行)。我猜最后一列不能有尾随字符(因为是最后一列)。谢谢!

Do you have some other tool that needs this? Because most spreadsheet programs, Python scripts, R scripts, etc. aren’t going to recognize the format any more than Pandas is.

您有没有其他需要这个工具的工具?因为大多数电子表格程序、Python脚本、R脚本等不会像Pandas一样识别这种格式。

Appending the first : to each field won’t work, because that just guarantees that every field will get quoted or escaped, so you’re going to get something like :":", or at best\::. (And even if you want to force the latter with dialect params, it’s still going to escape every colon it sees in the middle of a value, not just double-colons.)

第一个:对每个字段都使用“::”是不起作用的,因为这只能保证每个字段都被引用或转义,所以你将得到类似于:“:“或最多\::的结果。(And即使你想用方言参数强制后者,它仍然会转义值中间的每个冒号,而不仅仅是双冒号。

ftw, pandas now supports multi-char delimiters. However, if that delimiter shows up in quoted text, it's going to be split on and throw off the true number of fields detected in a line :(

FTW,熊猫现在支持多字符分隔符。但是,如果分隔符出现在带引号的文本中,它将被拆分,并丢弃在一行中检测到的真实字段数:(

优秀答案推荐

Use numpy.savetxt.

使用numpy.avetxt。


Examples:

例如:


np.savetxt(
'file.csv',
np.char.decode(chunk_data.values.astype(np.bytes_), 'UTF-8'),
delimiter='~|',
fmt='%s',
encoding=None)

np.savetxt(
'file.dat',
chunk_data.values,
delimiter='~|',
fmt='%s',
encoding='utf-8')


Think about what this line a::b::c‘ means to a standard CSV tool: an a, an empty column, a b, an empty column, and a c. Even in a more complicated case with quoting or escaping:"abc::def"::2 means an abc::def, an empty column, and a 2.

想想这一行a::b::c‘对标准CSV工具意味着什么:a、空列、b、空列和c。即使在带有引号或转义的更复杂的情况下:“abc::def”::2也表示abc::def、空列和2。



So, all you have to do is add an empty column between every column, and then use : as a delimiter, and the output will be almost what you want.

因此,您所要做的就是在每一列之间添加一个空列,然后使用:作为分隔符,输出结果几乎就是您想要的。



I say “almost” because Pandas is going to quote or escape single colons. Depending on the dialect options you’re using, and the tool you’re trying to interact with, this may or may not be a problem. Unnecessary quoting usually isn’t a problem (unless you ask for QUOTE_ALL, because then your columns will be separated by :"":, so hopefully you don’t need that dialect option), but unnecessary escapes might be (e.g., you might end up with every single : in a string turned into a \: or something). So you have to be careful with the options. But it’ll work for the basic “quote as needed, with mostly standard other options” settings.

我说“差不多”是因为熊猫会引用或转义单冒号。这可能是问题,也可能不是问题,这取决于您正在使用的方言选项以及您试图与之交互的工具。不必要的引号通常不是问题(除非您请求QUOTE_ALL,因为这样您的列将由:“”:分隔,因此希望您不需要那个方言选项),但不必要的转义可能是问题(例如,您可能会将字符串中的每个:都变成一个\:或其他东西)。因此,你必须谨慎对待各种选择。但它适用于基本的“按需报价,大多数是标准的其他选项”的设置。



For the moment I am stuck on an old version of pandas. My task was to read a csv with "__" delimiters, clean it to remove personal identifying information, and write the results a new file. I need the result to have the same two-character delimiter.

目前,我被困在一个旧版本的熊猫。我的任务是读取一个带有“__”分隔符的csv,清理它以删除个人识别信息,并将结果写入一个新文件。我需要结果具有相同的两个字符的字符串。


My preferred solution would have been to convert to numpy and save, like this:

我更喜欢的解决方案是转换为NumPy并保存,如下所示:


df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")

# remove personal identifying info from dataframe

massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)
np_data = massaged.to_numpy()
np.savetxt("patient_massaged.txt", np_data, fmt="%s", delimiter="__")

However, to_numpy() isn't supported in the version of Pandas I have.

但是,我所用的Pandas版本不支持to_numpy()。


So, my fix was to generate a csv with "}" as a temp delimiter, save that to a variable, do a string replace, and write the file myself:

因此,我的解决办法是生成一个以“}”作为临时分隔符的CSV,将其保存到一个变量中,执行字符串替换,然后自己编写文件:


df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")

# remove personal identifying info from dataframe

massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)

x = massaged.to_csv(sep="}", header=False, index=False)
x = x.replace("}", "__")

f=open("patient_massaged.txt", "w")
f.write(x)
f.close()

更多回答

if you're already using dataframes, you can simplify it and even include headers assuming df = pandas.Dataframe: numpy.savetxt(csv_filepath, df, delimiter=csv_file_delimeter, header=csv_file_delimeter.join(df.columns.values), fmt='%s', comments='', encoding=None) (Note the comments='' is needed because otherwise it will automatically prefix a comment symbol in front of the headers)

如果您已经在使用数据帧,您可以简化它,甚至包括假定df=anda的头部。Dataframe:umpy.avetxt(csv_filepath,df,delimiter=CSV_FILE_DELIMETER,header=csv_file_delimeter.join(df.columns.values),fmt=‘%S’,Comments=‘’,ENCODING=NONE)(注意,注释=‘’是必需的,否则它会在头部前面自动添加一个注释符号)

thanks @KtMack for the details about the column headers... feels weird to use join here but it works wonderfuly.

感谢@KtMack提供有关列标题的详细信息...在这里使用Join感觉很奇怪,但它工作得很好。

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com