gpt4 book ai didi

python - 接收索引错误 : string index out of range when using apply

转载 作者:太空宇宙 更新时间:2023-11-03 12:56:51 25 4
gpt4 key购买 nike

我想通过

从数据框中挑选最常用的名词
  1. 从我的数据的每一行中分离出名词。
  2. 为它们存储一个名为 train['token'] 的新列

为此,我将我的函数传递给应用函数,但我收到此错误

IndexError:字符串索引超出范围

这是我的代码

import pandas as pd
import numpy as np
import nltk

train= pd.read_csv(r'C:\Users\JKC\Downloads\classification_train.csv',names=['product_title','brand_id','category_id'])

train['product_title'] = train['product_title'].apply(lambda x: x.lower())

def preprocessing(x):
tokens = nltk.pos_tag(x.split(" "))
list=[]
for y,x in tokens:
if(x=="NN" or x=="NNS" or x=="NNP" or x=="NNPS"):
list.append(y)
return(' '.join(list))
# My function works fine if I use preprocessing(train['product_title'][1])



train['token'] = train['product_title'].apply(preprocessing,1)

回溯:

IndexError                                Traceback (most recent call last)
<ipython-input-53-f9f247eec617> in <module>()
10
11
---> 12 train['token'] = train['product_title'].apply(preprocessing,1)
13

C:\Users\JKC\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2235 values = lib.map_infer(values, boxer)
2236
-> 2237 mapped = lib.map_infer(values, f, convert=convert_dtype)
2238 if len(mapped) and isinstance(mapped[0], Series):
2239 from pandas.core.frame import DataFrame

pandas\src\inference.pyx in pandas.lib.map_infer (pandas\lib.c:63043)()

<ipython-input-53-f9f247eec617> in preprocessing(x)
1 def preprocessing(x):
----> 2 tokens = nltk.pos_tag(x.split(" "))
3 list=[]
4 for y,x in tokens:
5 if(x=="NN" or x=="NNS" or x=="NNP" or x=="NNPS"):

C:\Users\JKC\Anaconda3\lib\site-packages\nltk\tag\__init__.py in pos_tag(tokens, tagset)
109 """
110 tagger = PerceptronTagger()
--> 111 return _pos_tag(tokens, tagset, tagger)
112
113

C:\Users\JKC\Anaconda3\lib\site-packages\nltk\tag\__init__.py in _pos_tag(tokens, tagset, tagger)
80
81 def _pos_tag(tokens, tagset, tagger):
---> 82 tagged_tokens = tagger.tag(tokens)
83 if tagset:
84 tagged_tokens = [(token, map_tag('en-ptb', tagset, tag)) for (token, tag) in tagged_tokens]

C:\Users\JKC\Anaconda3\lib\site-packages\nltk\tag\perceptron.py in tag(self, tokens)
150 output = []
151
--> 152 context = self.START + [self.normalize(w) for w in tokens] + self.END
153 for i, word in enumerate(tokens):
154 tag = self.tagdict.get(word)

C:\Users\JKC\Anaconda3\lib\site-packages\nltk\tag\perceptron.py in <listcomp>(.0)
150 output = []
151
--> 152 context = self.START + [self.normalize(w) for w in tokens] + self.END
153 for i, word in enumerate(tokens):
154 tag = self.tagdict.get(word)

C:\Users\JKC\Anaconda3\lib\site-packages\nltk\tag\perceptron.py in normalize(self, word)
224 elif word.isdigit() and len(word) == 4:
225 return '!YEAR'
--> 226 elif word[0].isdigit():
227 return '!DIGITS'
228 else:

IndexError: string index out of range

Data:
product_title brand_id category_id
0 120gb hard disk drive with 3 years warranty fo... 3950 8
1 toshiba satellite l305-s5919 laptop lcd screen... 35099 324
2 hobby-ace pixhawk px4 rgb external led indicat... 21822 510
3 pelicans mousepad 44629 260
4 p4648-60029 hewlett-packard tc2100 system board 42835 68

我的数据中没有空行:

train.isnull().sum()
Out[12]:
product_title 0
brand_id 0
category_id 0
dtype: int64

最佳答案

您的输入在某些地方包含两个或多个连续空格。当您使用 x.split("") 拆分它时,您会在相邻空格之间获得零长度的“单词”。

通过使用 x.split() 拆分来修复它,它将任何连续的空白字符视为标记分隔符。

关于python - 接收索引错误 : string index out of range when using apply,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38420922/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com