gpt4 book ai didi

python - 如何摆脱 csv 文件中的 NaN 值? Python

转载 作者:行者123 更新时间:2023-12-04 10:31:15 25 4
gpt4 key购买 nike

首先,我知道有关于这件事的答案,但直到现在他们都没有为我工作。无论如何,我想知道您的答案,尽管我已经使用了该解决方案。

我有一个名为 mbti_datasets.csv 的 csv 文件.第一列的标签是 type第二列称为 description .每一行代表一个新的人格类型(带有各自的类型和描述)。

TYPE        | DESCRIPTION
a | This personality likes to eat apples...\nThey look like monkeys...\nIn fact, are strong people...
b | b.description
c | c.description
d | d.description
...16 types | ...

在下面的代码中,当描述有 \n 时,我试图复制每种性格类型。 .

代码:

import pandas as pd

# Reading the file
path_root = 'gdrive/My Drive/Colab Notebooks/MBTI/mbti_datasets.csv'
root_fn = path_rooth + 'mbti_datasets.csv'
df = pd.read_csv(path_root, sep = ',', quotechar = '"', usecols = [0, 1])

# split the column where there are new lines and turn it into a series
serie = df['description'].str.split('\n').apply(pd.Series, 1).stack()

# remove the second index for the DataFrame and the series to share indexes
serie.index = serie.index.droplevel(1)

# give it a name to join it to the DataFrame
serie.name = 'description'

# remove original column
del df['description']

# join the series with the DataFrame, based on the shared index
df = df.join(serie)

# New file name and writing the new csv file
root_new_fn = path_root + 'mbti_new.csv'

df.to_csv(root_new_fn, sep = ',', quotechar = '"', encoding = 'utf-8', index = False)
new_df = pd.read_csv(root_new_fn)

print(new_df)

预期输出:
TYPE | DESCRIPTION
a | This personality likes to eat apples...
a | They look like monkeys...
a | In fact, are strong people...
b | b.description
b | b.description
c | c.description
... | ...

当前输出:
TYPE | DESCRIPTION
a | This personality likes to eat apples...
a | They look like monkeys...NaN
a | NaN
a | In fact, are strong people...NaN
b | b.description...NaN
b | NaN
b | b.description
c | c.description
... | ...

我不是 100% 确定,但我认为 NaN 值是 \r .

按要求上传到github的文件:
CSV FILES

使用@YOLO 解决方案:
CSV YOLO FILE
例如。哪里失败了:

2 INTJ  Existe soledad en la cima y-- siendo # adds -- in blank random blank spaces
3 INTJ -- y las mujeres # adds -- in the beginning
3 INTJ (...) el 0--8-- de la poblaci # doesnt end the word 'población'
10 INTJ icos-- un conflicto que parecer--a imposible. # starts letters randomly
12 INTJ c #adds just 1 letter

完全理解的翻译:

2 INTJ There is loneliness at the top and-- being # adds -- in blank spaces
3 INTJ -- and women # adds - in the beginning
3 INTJ (...) on 0--8-- of the popula-- # doesnt end the word 'population'
10 INTJ icos-- a conflict that seems--to impossible. # starts letters randomly
12 INTJ c #adds just 1 letter

当我显示是否有任何 NaN 值以及哪种类型时:

print(new_df['descripcion'].isnull())

<class 'float'>
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 True
8 False
9 True
10 False
11 True
continue...

最佳答案

这是一种方法,我必须找到一种解决方法来替换 \n性格,不知何故它没有以直接的方式工作:

df['DESCRIPTION'] = df['DESCRIPTION'].str.replace('[^a-zA-Z0-9\s.]','--').str.split('--n')

df = df.explode('DESCRIPTION')

print(df)

TYPE DESCRIPTION
0 a This personality likes to eat apples...
0 a They look like monkeys...
0 a In fact-- are strong people...
1 b b.description
2 c c.description
3 d d.description

关于python - 如何摆脱 csv 文件中的 NaN 值? Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60419744/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com