gpt4 book ai didi

python - 从句子列中提取新特征 - Python

转载 作者:行者123 更新时间:2023-11-30 09:40:15 25 4
gpt4 key购买 nike

我有两个数据框:

city_state数据框

    city        state
0 huntsville alabama
1 montgomery alabama
2 birmingham alabama
3 mobile alabama
4 dothan alabama
5 chicago illinois
6 boise idaho
7 des moines iowa

和句子数据框

    sentence
0 marthy was born in dothan
1 michelle reads some books at her home
2 hasan is highschool student in chicago
3 hartford of the west is the nickname of des moines

我想从名为 city 的句子数据框中提取新特征。该栏目city摘自sentence如果句子中包含某个名称 city来自专栏city_state['city'] ,如果它不包含某个名称 city它的值将为 Null。

预期的新数据框将如下所示:

    sentence                                        city
0 marthy was born in dothan dothan
1 michelle reads some books at her home Null
2 hasan is highschool student in chicago chicago
3 capital of dream is the motto of des moines des moines

我已经运行了这段代码

sentence['city'] ={}

for city in city_state.city:
for text in sentence.sentence:
words = text.split()
for word in words:
if word == city:
sentence['city'].append(city)
break
else:
sentence['city'].append(None)

但是这段代码的结果是这样的

ValueError: Length of values does not match length of index

如果您有类似案例的特征工程经验,您能否给我一些建议,如何编写正确的代码以获得预期结果。

谢谢

注意:这是错误的完整日志

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-205-8a9038a015ee> in <module>
----> 1 sentence['city'] ={}
2
3 for city in city_state.city:
4 for text in sentence.sentence:
5 words = text.split()

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3117 else:
3118 # set column
-> 3119 self._set_item(key, value)
3120
3121 def _setitem_slice(self, key, value):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3192
3193 self._ensure_valid_index(value)
-> 3194 value = self._sanitize_column(key, value)
3195 NDFrame._set_item(self, key, value)
3196

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
3389
3390 # turn me into an ndarray
-> 3391 value = _sanitize_index(value, self.index, copy=False)
3392 if not isinstance(value, (np.ndarray, Index)):
3393 if isinstance(value, list) and len(value) > 0:

~\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy)
3999
4000 if len(data) != len(index):
-> 4001 raise ValueError('Length of values does not match length of ' 'index')
4002
4003 if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index

最佳答案

一些快速而肮脏的应用,尚未在大型数据帧上进行测试,因此请谨慎使用。首先定义一个提取城市名称的函数:

def ex_city(col, cities):
output = []
for w in cities:
if w in col:
output.append(w)
return ','.join(output) if output else None

然后将其应用到您的句子数据框中

city_list = city_state.city.unique().tolist()
sentence['city'] = sentence['sentence'].apply(lambda x: ex_city(x, city_list))

关于python - 从句子列中提取新特征 - Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59259919/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com