gpt4 book ai didi

python - 应用于空字段时出现 OneHotEncoding 错误

转载 作者:行者123 更新时间:2023-12-01 08:14:28 26 4
gpt4 key购买 nike

该代码包括将 OneHotEncoding 技术应用于 binetflow 文件的两个字段:ProtoState。我必须对 5 个文件执行此操作。我能够将下面的代码完美地应用到前两个代码中。当到达第三个时,它会抛出错误:

TypeError: '<' not supported between instances of 'str' and 'float'.

我确信错误出现在文件的 0.000000,icmp,,60,60.0,0 行中,其中 State 字段为空。

我想简单地忽略 One hot Encoding 并按原样复制 State 字段,该字段为空并跳转到下一行。

df = opendataset()

df['State2'] = df['State']
df['Proto2'] = df['Proto']
df['Dur'] = df.Dur.apply(lambda n: '%.6f' % n)

le = LabelEncoder()
dfle = df
dfle.State = le.fit_transform(dfle.State)
X = dfle[['State']].values
Y = dfle[['Proto']].values
ohe = OneHotEncoder()
OnehotX = ohe.fit_transform(X).toarray()
OnehotY = ohe.fit_transform(Y).toarray()

dx = pd.DataFrame(data=OnehotX)
dy = pd.DataFrame(data=OnehotY)

dfle['State'] = (dx[dx.columns[0:]].apply(lambda x:''.join(x.dropna().astype(int).astype(str)), axis=1))
dfle['Proto'] = (dy[dy.columns[0:]].apply(lambda y:''.join(y.dropna().astype(int).astype(str)), axis=1))

enter image description here

08-03编辑

这(如下)是我运行上面的代码时的 TraceBack。如您所见,错误是 dfle.State = le.fit_transform(dfle.State) ,因此 OnehotX = ohe.fit_transform(X).toarray()。

Traceback (most recent call last):

File "C:/Users/V/PycharmProjects/PreProcess/testfile.py", line 39, in dfle.State = le.fit_transform(dfle.State)

File "C:\Users\V\PycharmProjects\PreProcess\venv\lib\site-packages\sklearn\preprocessing\label.py", line 236, in fit_transform self.classes_, y = _encode(y, encode=True)

File "C:\Users\V\PycharmProjects\PreProcess\venv\lib\site-packages\sklearn\preprocessing\label.py", line 108, in _encode return _encode_python(values, uniques, encode)

File "C:\Users\V\PycharmProjects\PreProcess\venv\lib\site-packages\sklearn\preprocessing\label.py", > line 63, in _encode_python uniques = sorted(set(values))

TypeError: '<' not supported between instances of 'str' and 'float'

新代码:我尝试按照 Hemerson Tacon 所说的操作,将 Try/Exception 应用于回溯引发错误的部分,但它警告我它有一个错误并引发另一个错误。

le = LabelEncoder()
dfle = df

try:
dfle.State = le.fit_transform(dfle.State)
except TypeError:
pass
X = dfle[['State']].values
Y = dfle[['Proto']].values
ohe = OneHotEncoder()
try:
OnehotX = ohe.fit_transform(X).toarray()
except ValueError:
pass

OnehotY = ohe.fit_transform(Y).toarray()

dx = pd.DataFrame(data=OnehotX)
dy = pd.DataFrame(data=OnehotY)

dfle['State'] = (dx[dx.columns[0:]].apply(lambda x:''.join(x.dropna().astype(int).astype(str)), axis=1))
dfle['Proto'] = (dy[dy.columns[0:]].apply(lambda y:''.join(y.dropna().astype(int).astype(str)), axis=1))

新错误:

Traceback (most recent call last): File "C:/Users/V/PycharmProjects/PreProcess/testfile.py", line 53, in ** dx = pd.DataFrame(data=OnehotX) NameError: name 'OnehotX' is not defined**

最后编辑 09/03

问题的解决方案是简单地将 df.replace() 行添加到代码中。因此,当它读取时,它会将 NaN 替换为空单词,从而解决了问题。

dfle['State'].replace(np.nan,"empty", inplace=True)

df = opendataset()

df['State2'] = df['State']
df['Proto2'] = df['Proto']
df['Dur'] = df.Dur.apply(lambda n: '%.6f' % n)

le = LabelEncoder()
dfle = df

dfle['State'].replace(np.nan,"empty", inplace=True)

dfle.State = le.fit_transform(dfle.State)

X = dfle[['State']].values
Y = dfle[['Proto']].values
ohe = OneHotEncoder()

OnehotX = ohe.fit_transform(X).toarray()
OnehotY = ohe.fit_transform(Y).toarray()

dx = pd.DataFrame(data=OnehotX)
dy = pd.DataFrame(data=OnehotY)

最佳答案

您可以将有问题的代码放入 try block 中并捕获 TypeError 异常,检查 State 字段是否为空,如果为 true 则忽略它正如你所说,如果不正确,再次引发错误。

如果您发布了对数据调用 OneHotEncoding 的实际代码,将更容易回答您并在答案中提供一些代码。

编辑

OnehotX 变量仅在 try block 内定义。您需要在此 block 之外和之前定义它来修复错误。像 OnehotX = None 这样的东西就可以了。另外,我强调一下我之前所说的,在 except block 中测试异常是否是由于您已识别的问题引起的是一个很好的做法,这意味着,测试 State 字段是否为空。

关于python - 应用于空字段时出现 OneHotEncoding 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55054977/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com