gpt4 book ai didi

python - OneHotEncoder 只有一个特征是字符串

转载 作者:行者123 更新时间:2023-11-28 22:20:24 26 4
gpt4 key购买 nike

我希望将我仅有的一个特征转换为单独的二进制特征:

df["pattern_id"]
Out[202]:
0 3
1 3
...
7440 2
7441 2
7442 3
Name: pattern_id, Length: 7443, dtype: int64
df["pattern_id"]
Out[202]:
0 0 0 1
1 0 0 1
...
7440 0 1 0
7441 0 1 0
7442 0 0 1
Name: pattern_id, Length: 7443, dtype: int64

我想用OneHotEncoder,数据是int,所以不需要编码:

onehotencoder = OneHotEncoder(categorical_features=["pattern_id"])
df = onehotencoder.fit_transform(df).toarray()

ValueError: could not convert string to float: 'http://www.zaragoza.es/sedeelectronica/'

有趣的是我收到一个错误...sklearn 试图编码另一列,而不是我想要的。

我们必须将 pattern_id 编码为一个整数值

我使用了这个链接:Issue with OneHotEncoder for categorical features

#transform the pattern_id feature to int
encoding_feature = ["pattern_id"]
enc = LabelEncoder()
enc.fit(encoding_feature)
working_feature = enc.transform(encoding_feature)
working_feature = working_feature.reshape(-1, 1)
ohe = OneHotEncoder(sparse=False)


#convert the pattern_id feature to separate binary features
onehotencoder = OneHotEncoder(categorical_features=working_feature, sparse=False)
df = onehotencoder.fit_transform(df).toarray()

我得到了同样的错误。我做错了什么?

编辑

来源: https://github.com/martin-varbanov96/scraper/blob/master/logo_scrape/logo_scrape/analysis.py

df
Out[259]:
found_img is_http link_img \
0 True 0 img/aahoteles.svg
//www.zaragoza.es/cont/paginas/img/sede/logo_e...

pattern_id current_link site_id \
0 3 https://www.aa-hoteles.com/es/reservas 3
6 3 https://www.aa-hoteles.com/es/ofertas-hoteles 3
7 2 http://about.pressreader.com/contact-us/ 4
8 3 http://about.pressreader.com/contact-us/ 4

status link_id
0 200 https://www.aa-hoteles.com/
1 200 https://www.365travel.asia/
2 200 https://www.365travel.asia/
3 200 https://www.365travel.asia/
4 200 https://www.aa-hoteles.com/
5 200 https://www.aa-hoteles.com/
6 200 https://www.aa-hoteles.com/
7 200 http://about.pressreader.com
8 200 http://about.pressreader.com
9 200 https://www.365travel.asia/
10 200 https://www.365travel.asia/
11 200 https://www.365travel.asia/
12 200 https://www.365travel.asia/
13 200 https://www.365travel.asia/
14 200 https://www.365travel.asia/
15 200 https://www.365travel.asia/
16 200 https://www.365travel.asia/
17 200 https://www.365travel.asia/
18 200 http://about.pressreade

[7443 rows x 8 columns]

最佳答案

如果您查看 OneHotEncoder 的文档您可以看到 categorical_features 参数需要“全部”或索引数组或掩码不是字符串。您可以通过更改为以下行来使您的代码工作

import pandas as pd
from sklearn.preprocessing import OneHotEncoder
# Create a dataframe of random ints
df = pd.DataFrame(np.random.randint(0, 4, size=(100, 4)),
columns=['pattern_id', 'B', 'C', 'D'])
onehotencoder = OneHotEncoder(categorical_features=[df.columns.tolist().index('pattern_id')])
df = onehotencoder.fit_transform(df)

但是 df 将不再是 DataFrame,我建议直接使用 numpy 数组。

关于python - OneHotEncoder 只有一个特征是字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48993412/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com