gpt4 book ai didi

python - 做tostring()时丢失原始文本

转载 作者:太空宇宙 更新时间:2023-11-03 19:41:58 25 4
gpt4 key购买 nike

我有一个JSON对象,其中一个键是:

"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang...


加载数据框:

#initialize dataframe for the universe transcript 

dfJson = pd.read_json('test1.json')



这是我尝试提取它的代码。

dfJsonTranscript = dfJson.get('transcript').to_string()
pprint.pprint(dfJsonTranscript)

text_file = open("sample.txt", "wt")
n = text_file.write(dfJsonTranscript)
text_file.close()


我的输出

0      The universe is bustling with matter and energ...
1 The universe is bustling with matter and energ...
2 The universe is bustling with matter and energ...
3 The universe is bustling with matter and energ...
4 The universe is bustling with matter and energ...
5 The universe is bustling with matter and energ...
6 The universe is bustling with matter and energ...
7 The universe is bustling with matter and energ...
8 The universe is bustling with matter and energ...


原始JSON:

"transcript": "The universe is bustling with matter and energy. Even in the vast apparent emptiness of intergalactic space, there's one hydrogen atom per cubic meter. That's not the mention a barrage of particles and electromagnetic radiation passing every which way from stars, galaxies, and into black holes. There's even radiation left over from the Big Bang... universe. ",
"words": [
{
"alignedWord": "the",
"case": "success",
"end": 6.31,
"endOffset": 3,
"phones": [
{
"duration": 0.09,
"phone": "dh_B"
},
{
"duration": 0.05,
"phone": "iy_E"
}
],
"start": 6.17,
"startOffset": 0,
"word": "The"
},
{
"alignedWord": "universe",
"case": "success",
"end": 6.83,
"endOffset": 12,
"phones": [
{
"duration": 0.08,
"phone": "y_B"
},


为什么在键上运行toString()方法时会丢失键的原始值。我会因为通过熊猫将其变成数据框而丢失它吗?

最佳答案

尝试这个:

dfJsonTranscript = dfJson.get('transcript').to_string(index=False)


设置 index=False可以指示 to_stringDataFrame方法不打印索引(行)标签。

编辑:
为了防止字符串被截断,您可以在pandas上设置 max_colwidth属性,需要在调用 to_string方法之前进行设置。

pd.set_option("display.max_colwidth", None)


更新:

#initialize dataframe for the universe transcript 
import pandas as pd
import pprint

pd.set_option('display.max_colwidth', None) #--> To avoid truncation

dfJson = pd.read_json('data.json')
dfJsonTranscript = dfJson.get('transcript').to_string(index=False).strip()
pprint.pprint(dfJsonTranscript)

text_file = open("sample.txt", "wt")
n = text_file.write(dfJsonTranscript)
text_file.close()

关于python - 做tostring()时丢失原始文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60369835/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com