gpt4 book ai didi

python - pandas,to_csv()转为特定格式

转载 作者:行者123 更新时间:2023-11-30 23:23:03 29 4
gpt4 key购买 nike

我想使用 DecisionTree 2.2.2 构建决策树。 https://engineering.purdue.edu/kak/distDT/DecisionTree-2.2.2.html

但是,它使用这种奇怪的 csv 格式。

"","pgtime","pgstat","age","eet","g2","grade","gleason","ploidy"
"1",6.1,0,64,2,10.26,2,4,"diploid"
"2",9.4,0,62,1,NA,3,8,"aneuploid"
"3",5.2,1,59,2,9.99,3,7,"diploid"
"4",3.2,1,62,2,3.57,2,4,"diploid"
"5",1.9,1,64,2,22.56,4,8,"tetraploid"
"6",4.8,0,69,1,6.14,3,7,"diploid"
"7",5.8,0,75,2,13.69,2,NA,"tetraploid"
"8",7.3,0,71,2,NA,3,7,"aneuploid"
"9",3.7,1,73,2,11.77,3,6,"diploid"
  • 第一行第一个元素应该是“”
  • header 名称应加引号。
  • 索引列应加引号。
  • 所有的象征性特征都应该被引用。

我的问题是如何使用 pandas to_csv 函数将 DataFrame 保存为这种格式?如果不可能,您能建议什么是最好的解决方案吗?

谢谢

<小时/>

这是我尝试过的。我将列转换为字符串类型:

df.col1 = df.col1.apply(str) 

并在保存时使用index_label:

df.to_csv( 'filename.csv', header=True, index=True, index_label='"') 

但这给了我以下内容:

"""",url,class,length,volume,name,degree,pagerank
......

第一个元素是四个引号。

最佳答案

首先只是为了证明阅读此很好:

In [11]: df = pd.read_clipboard(sep=',', index_col=0)

In [12]: df
Out[12]:
pgtime pgstat age eet g2 grade gleason ploidy
1 6.1 0 64 2 10.26 2 4 diploid
2 9.4 0 62 1 NaN 3 8 aneuploid
3 5.2 1 59 2 9.99 3 7 diploid
4 3.2 1 62 2 3.57 2 4 diploid
5 1.9 1 64 2 22.56 4 8 tetraploid
6 4.8 0 69 1 6.14 3 7 diploid
7 5.8 0 75 2 13.69 2 NaN tetraploid
8 7.3 0 71 2 NaN 3 7 aneuploid
9 3.7 1 73 2 11.77 3 6 diploid

您必须使用quoting=csv.QUOTING_NONNUMERIC * 输出csv时:

In [21]: s = StringIO()

In [22]: df.to_csv(s, quoting=2) # or output to file instead

In [23]: s.getvalue()
Out[23]: '"","pgtime","pgstat","age","eet","g2","grade","gleason","ploidy"\n1,6.1,0,64,2,10.26,2,4.0,"diploid"\n2,9.4,0,62,1,"",3,8.0,"aneuploid"\n3,5.2,1,59,2,9.99,3,7.0,"diploid"\n4,3.2,1,62,2,3.57,2,4.0,"diploid"\n5,1.9,1,64,2,22.56,4,8.0,"tetraploid"\n6,4.8,0,69,1,6.14,3,7.0,"diploid"\n7,5.8,0,75,2,13.69,2,"","tetraploid"\n8,7.3,0,71,2,"",3,7.0,"aneuploid"\n9,3.7,1,73,2,11.77,3,6.0,"diploid"\n'

* QUOTING_NONNUMERIC 为 2。

现在,这并不是您想要的,因为索引列没有被引用,我只需修改索引:

In [24]: df.index = df.index.astype(str)  # unicode in python 3?

In [25]: s = StringIO()

In [26]: df.to_csv(s, quoting=2)

In [27]: s.getvalue()
Out[27]: '"","pgtime","pgstat","age","eet","g2","grade","gleason","ploidy"\n"1",6.1,0,64,2,10.26,2,4.0,"diploid"\n"2",9.4,0,62,1,"",3,8.0,"aneuploid"\n"3",5.2,1,59,2,9.99,3,7.0,"diploid"\n"4",3.2,1,62,2,3.57,2,4.0,"diploid"\n"5",1.9,1,64,2,22.56,4,8.0,"tetraploid"\n"6",4.8,0,69,1,6.14,3,7.0,"diploid"\n"7",5.8,0,75,2,13.69,2,"","tetraploid"\n"8",7.3,0,71,2,"",3,7.0,"aneuploid"\n"9",3.7,1,73,2,11.77,3,6.0,"diploid"\n'

根据需要。

关于python - pandas,to_csv()转为特定格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24172896/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com