gpt4 book ai didi

python - 从 Pandas Dataframe 创建 Numpy 数组时丢失字符串

转载 作者:太空宇宙 更新时间:2023-11-04 01:07:55 24 4
gpt4 key购买 nike

很抱歉,如果这太基础了...本质上,我正在使用 pandas 加载一个巨大的 CSV 文件,然后将其转换为 numpy 用于后处理的数组。感谢您的帮助!

问题是一些字符串在转换过程中丢失(从 pandas dataframenumpy array)。例如,“abstract”列中的字符串是完整的,请参见下面的 print datafile["abstract"][0]。但是,一旦我将它们转换为 numpy 数组,就只剩下几个字符串了。见下文 print df_all[0,3]

import pandas as pd
import csv
import numpy as np

datafile = pd.read_csv(path, header=0)
df_all = pd.np.array(datafile, dtype='string')
header_t = list(datafile.columns.values)

pandas dataframe 中的字符串是完整的`

print datafile["abstract"][0]
In order to test the widely held assumption that homeopathic medicines contain negligible quantities of their major ingredients, six such medicines labeled in Latin as containing arsenic were purchased over the counter and by mail order and their arsenic contents measured. Values determined were similar to those expected from label information in only two of six and were markedly at variance in the remaining four. Arsenic was present in notable quantities in two preparations. Most sales personnel interviewed could not identify arsenic as being an ingredient in these preparations and were therefore incapable of warning the general public of possible dangers from ingestion. No such warnings appeared on the labels.

numpy 中的字符串不完整`

print df_all[0,3]
In order to test the widely held assumption that homeopathic me

最佳答案

我认为当您指定 dtype='string' 时,您实质上是在指定默认的 S64 类型,这会将您的字符串截断为 64 个字符。只需跳过 dtype='string' 部分即可(dtype 将变为 object)。

更好的是,不要将 DataFrame 转换为 array,使用内置的 df.values

关于python - 从 Pandas Dataframe 创建 Numpy 数组时丢失字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29396952/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com