Python Pandas - use Multiple Character Delimiter when writing to_csv(Python Pandas-写入

Python Pandas - use Multiple Character Delimiter when writing to_csv(Python Pandas-写入_csv时使用多个字符分隔符)

转载作者：bug小助手更新时间：2023-10-26 20:22:16

It appears that the pandas to_csv function only allows single character delimiters/separators.

PANDA TO_CSV函数似乎只允许使用单个字符分隔符/分隔符。

Is there some way to allow for a string of characters to be used like, "::" or "%%" instead?

是否有某种方法允许使用字符串，如“：：”或“%%”？

I tried:

我试过：

df.to_csv(local_file,  sep = '::', header=None, index=False)

and getting:

并获得：

TypeError: "delimiter" must be a 1-character string

更多回答

You could append to each element a single character of your desired separator and then pass a single character for the delimeter, but if you intend to read this back into pandas then you will encounter the same difficulty

您可以将所需分隔符的单个字符附加到每个元素，然后传递一个用于分隔符的字符，但如果您打算将其读回PANDA，则会遇到同样的困难

@EdChum Good idea.. What would be a command to append a single character to each field in DF (it has 100 columns and 10000 rows). I am guessing the last column must not have trailing character (because is last). Thanks!

@EdChum好主意..向df中的每个字段追加单个字符的命令(它有100列和10000行)。我猜最后一列不能有尾随字符(因为是最后一列)。谢谢!

Do you have some other tool that needs this? Because most spreadsheet programs, Python scripts, R scripts, etc. aren’t going to recognize the format any more than Pandas is.

您有没有其他需要这个工具的工具？因为大多数电子表格程序、Python脚本、R脚本等不会像Pandas一样识别这种格式。

Appending the first : to each field won’t work, because that just guarantees that every field will get quoted or escaped, so you’re going to get something like :":", or at best\::. (And even if you want to force the latter with dialect params, it’s still going to escape every colon it sees in the middle of a value, not just double-colons.)

第一个：对每个字段都使用“：：”是不起作用的，因为这只能保证每个字段都被引用或转义，所以你将得到类似于：“：“或最多\：：的结果。(And即使你想用方言参数强制后者，它仍然会转义值中间的每个冒号，而不仅仅是双冒号。

ftw, pandas now supports multi-char delimiters. However, if that delimiter shows up in quoted text, it's going to be split on and throw off the true number of fields detected in a line :(

FTW，熊猫现在支持多字符分隔符。但是，如果分隔符出现在带引号的文本中，它将被拆分，并丢弃在一行中检测到的真实字段数：(

优秀答案推荐

Use numpy.savetxt.

使用numpy.avetxt。

Examples:

例如：

np.savetxt(
    'file.csv',
    np.char.decode(chunk_data.values.astype(np.bytes_), 'UTF-8'),
    delimiter='~|',
    fmt='%s',
    encoding=None)

np.savetxt(
    'file.dat',
    chunk_data.values,
    delimiter='~|',
    fmt='%s',
    encoding='utf-8')

Think about what this line a::b::c‘ means to a standard CSV tool: an a, an empty column, a b, an empty column, and a c. Even in a more complicated case with quoting or escaping:"abc::def"::2 means an abc::def, an empty column, and a 2.

想想这一行a：：b：：c‘对标准CSV工具意味着什么：a、空列、b、空列和c。即使在带有引号或转义的更复杂的情况下：“abc：：def”：：2也表示abc：：def、空列和2。

So, all you have to do is add an empty column between every column, and then use : as a delimiter, and the output will be almost what you want.

因此，您所要做的就是在每一列之间添加一个空列，然后使用：作为分隔符，输出结果几乎就是您想要的。

I say “almost” because Pandas is going to quote or escape single colons. Depending on the dialect options you’re using, and the tool you’re trying to interact with, this may or may not be a problem. Unnecessary quoting usually isn’t a problem (unless you ask for QUOTE_ALL, because then your columns will be separated by :"":, so hopefully you don’t need that dialect option), but unnecessary escapes might be (e.g., you might end up with every single : in a string turned into a \: or something). So you have to be careful with the options. But it’ll work for the basic “quote as needed, with mostly standard other options” settings.

我说“差不多”是因为熊猫会引用或转义单冒号。这可能是问题，也可能不是问题，这取决于您正在使用的方言选项以及您试图与之交互的工具。不必要的引号通常不是问题(除非您请求QUOTE_ALL，因为这样您的列将由：“”：分隔，因此希望您不需要那个方言选项)，但不必要的转义可能是问题(例如，您可能会将字符串中的每个：都变成一个\：或其他东西)。因此，你必须谨慎对待各种选择。但它适用于基本的“按需报价，大多数是标准的其他选项”的设置。

For the moment I am stuck on an old version of pandas. My task was to read a csv with "__" delimiters, clean it to remove personal identifying information, and write the results a new file. I need the result to have the same two-character delimiter.

目前，我被困在一个旧版本的熊猫。我的任务是读取一个带有“__”分隔符的csv，清理它以删除个人识别信息，并将结果写入一个新文件。我需要结果具有相同的两个字符的字符串。

My preferred solution would have been to convert to numpy and save, like this:

我更喜欢的解决方案是转换为NumPy并保存，如下所示：

df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")

# remove personal identifying info from dataframe

massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)
np_data = massaged.to_numpy()
np.savetxt("patient_massaged.txt", np_data, fmt="%s", delimiter="__")

However, to_numpy() isn't supported in the version of Pandas I have.

但是，我所用的Pandas版本不支持to_numpy()。

So, my fix was to generate a csv with "}" as a temp delimiter, save that to a variable, do a string replace, and write the file myself:

因此，我的解决办法是生成一个以“}”作为临时分隔符的CSV，将其保存到一个变量中，执行字符串替换，然后自己编写文件：

df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")

# remove personal identifying info from dataframe

massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)

x = massaged.to_csv(sep="}", header=False, index=False)
x = x.replace("}", "__")

f=open("patient_massaged.txt", "w")
f.write(x)
f.close()

更多回答

if you're already using dataframes, you can simplify it and even include headers assuming df = pandas.Dataframe: numpy.savetxt(csv_filepath, df, delimiter=csv_file_delimeter, header=csv_file_delimeter.join(df.columns.values), fmt='%s', comments='', encoding=None) (Note the comments='' is needed because otherwise it will automatically prefix a comment symbol in front of the headers)

如果您已经在使用数据帧，您可以简化它，甚至包括假定df=anda的头部。Dataframe：umpy.avetxt(csv_filepath，df，delimiter=CSV_FILE_DELIMETER，header=csv_file_delimeter.join(df.columns.values)，fmt=‘%S’，Comments=‘’，ENCODING=NONE)(注意，注释=‘’是必需的，否则它会在头部前面自动添加一个注释符号)

thanks @KtMack for the details about the column headers... feels weird to use join here but it works wonderfuly.

感谢@KtMack提供有关列标题的详细信息...在这里使用Join感觉很奇怪，但它工作得很好。

php - 将 PHP 写入 Javascript 或将 Javascript 写入 PHP
我有这个代码 var myChart = new FusionCharts("../themes/clean/charts/hbullet.swf", "myChartId", "400", "75
Linux 异步 (io_submit) 写入 v/s 正常(缓冲)写入
既然写入是立即进行的(复制到内核缓冲区并返回)，那么使用 io_submit 进行写入有什么好处？事实上，它 (aio/io_submit) 看起来更糟，因为您必须在堆上分配写入缓冲区并且不能使用基
javascript - 当从网络服务器提供服务时，写入 .innerHTML 不起作用，但当作为文件浏览时，写入 .innerHTML 不起作用，这可能是什么原因造成的？
我正在使用 mootool 的 Request.JSON 从 Twitter 检索推文。收到它后，我将写入目标 div 的 .innerHTML 属性。当我在本地将其作为文件进行测试时，即 file:
python - 为什么从 Spark 写入 Vertica DB 比从 Spark 写入 MySQL 需要更长的时间？
最终，我想将 Vertica DB 中的数据抓取到 Spark 中，训练机器学习模型，进行预测，并将这些预测存储到另一个 Vertica DB 中。当前的问题是确定流程最后部分的瓶颈:将 Spark
java - 更改将 double 写入 CSV 的 Java 代码以将 double[] 写入 CSV(用例 = WEKA 库)
我使用 WEKA 库编写了一个 Java 程序，训练分类算法使用经过训练的算法对未标记的数据集运行预测将结果写入 .csv 文件问题在于它当前写出离散分类结果(即算法猜测一行属于哪个类别)。我
clickonce - 写入/读取数据目录是否需要管理员权限？
背景 - 我正在考虑使用 clickonce 通过 clickonce(通过网站)部署 WinForms 应用程序。相对简单的应用程序的要素是: - 它是一个可执行文件和一个数据库文件(sqlite)
arrays - 快速初始化C数组(写入)
是否有更好的解决方案来快速初始化 C 数组(在堆上创建)？就像我们使用大括号一样 double** matrix_multiply(const double **l_matrix, const dou
java - 写入 JSONArray
我正在读取 JSON 文件，取出值并进行一些更改。基本上我向数组添加了一些值。之后我想将其写回到文件中。当我将 JSONArray 写回文件时，会被写入字符串而不是 JSONArray 对象。怎样才
c# - 从页面文件读取/写入
我为两个应用程序使用嵌入式数据库，其中一个是服务器，另一个是客户端。客户端应用程序。可以向服务器端发送获取数据请求以检索数据并显示在表格(或其他)中。问题是这样的:如何将获取的数据保存(写入)到页面文
arrays - 快速初始化C数组(写入)
是否有更好的解决方案来快速初始化 C 数组(在堆上创建)？就像我们使用大括号一样 double** matrix_multiply(const double **l_matrix, const dou
java - 如何在Java中从内存中逐位读取/写入
从问题得出问题:找到所有 result = new ArrayList(); for (int i = 2; i >(i%8) & 0x1) == 0) { result.add(i
python - 写入 CSV
由于某种原因，它没有写入 CSV。谁能明白为什么它不写吗？ def main(): list_of_emails = read_email_csv() #read input file, cr
javascript - 写入\: on a URL
关闭。这个问题是 not reproducible or was caused by typos 。它目前不接受答案。这个问题是由于错别字或无法再重现的问题引起的。虽然类似的问题可能在这里出现，
c - 写入/读取二进制保存游戏
我目前正在开发一个保存和加载程序，但我无法获得正确的结果。编写程序: #include #include #define FILENAME "Save" #define COUNT 6 type
java - 从二进制文件读取/写入
import java.io.*; public class Main2 { public static void main(String[] args) throws Exception {
iphone - 写入 UITextView
我需要使用预定义位置字符串“Office”从所有日历中检索所有 iOS 事件，然后将结果写入 NSLog 和 UITextView。到目前为止，这是我的代码: #import "ViewCo
ios - 写入 PFInstallation
我正在尝试将 BOOL 值写入 PFInstallation 中的列，但会不停地崩溃: - (IBAction)pushSwitch:(id)sender { NSUserDefaults *push
c# - 写入 MySQL
我以前在学校学过一些简单的数据库编程，但现在我正在尝试学习最佳实践，因为我正在编写更复杂的应用程序。写入 MySQL 数据库并不难，但我想知道让分布式应用程序写入 Amazon EC2 上的远程数据库
Java 写入 ResourceBundle
是否可以写回到ResourceBundle？目前我正在使用 ResourceBundle 来存储信息，在运行时使用以下内容读取信息 while(ResourceBundle.getBundle("bu
c - 写入 - 读取二进制文件中的结构
关闭。这个问题是not reproducible or was caused by typos .它目前不接受答案。这个问题是由于错别字或无法再重现的问题引起的。虽然类似的问题可能是on-topi

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Python Pandas - use Multiple Character Delimiter when writing to_csv(Python Pandas-写入_csv时使用多个字符分隔符)