gpt4 book ai didi

python - 使用字符串搜索 Pandas 系列会产生 KeyError

转载 作者:太空狗 更新时间:2023-10-30 02:03:58 25 4
gpt4 key购买 nike

我正在尝试使用 df[df['col'].str.contains("string")](在这两个 SO 问题中进行了描述:12)根据部分字符串匹配选择行。这是我的代码:

import requests
import json
import pandas as pd
import datetime

url = "http://api.turfgame.com/v4/zones/all" # get request returns .json
r = requests.get(url)
df = pd.read_json(r.content) # create a df containing all zone info

print df[df['region'].str.contains("Uppsala")].head()

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-23-55bbf5679808> in <module>()
----> 1 print df[df['region'].str.contains("Uppsala")].head()

C:\Users\User\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1670 if isinstance(key, (Series, np.ndarray, list)):
1671 # either boolean or fancy integer index
-> 1672 return self._getitem_array(key)
1673 elif isinstance(key, DataFrame):
1674 return self._getitem_frame(key)

C:\Users\User\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
1714 return self.take(indexer, axis=0, convert=False)
1715 else:
-> 1716 indexer = self.ix._convert_to_indexer(key, axis=1)
1717 return self.take(indexer, axis=1, convert=True)
1718

C:\Users\User\AppData\Local\Enthought\Canopy32\User\lib\site-packages\pandas\core\indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
1083 if isinstance(obj, tuple) and is_setter:
1084 return {'key': obj}
-> 1085 raise KeyError('%s not in index' % objarr[mask])
1086
1087 return indexer

KeyError: '[ nan nan nan ..., nan nan nan] not in index'

我不明白我得到一个 KeyError 因为 df.columns 返回:

Index([u'dateCreated', u'id', u'latitude', u'longitude', u'name', u'pointsPerHour', u'region', u'takeoverPoints', u'totalTakeovers'], dtype='object')

所以 Key 位于列列表中,在 Internet 浏览器中打开页面我可以找到 739 个“Uppsala”实例。

我搜索的列是一个嵌套的 .json 表,看起来像这样 {"id":200,"name":"Scotland","country": “国标”。我是否需要做一些特别的事情来在“{}”字符之间进行搜索?有人可以解释我在哪里犯了错误吗?

最佳答案

在我看来,您的 region 列包含字典,这些字典并不真正支持作为元素,因此 .str 无法正常工作。解决该问题的一种方法是将 region 字典提升到它们自己的列中,也许是这样的:

>>> region = pd.DataFrame(df.pop("region").tolist())
>>> df = df.join(region, rsuffix="_region")

之后你有

>>> df.head()
dateCreated id latitude longitude name pointsPerHour takeoverPoints totalTakeovers country id_region name_region
0 2013-06-15T08:00:00+0000 14639 55.947079 -3.206477 GrandSquare 1 185 32 gb 200 Scotland
1 2014-06-15T20:02:37+0000 31571 55.649181 12.609056 Stenringen 1 185 6 dk 172 Hovedstaden
2 2013-06-15T08:00:00+0000 18958 54.593570 -5.955772 Hospitality 0 250 1 gb 206 Northern Ireland
3 2013-06-15T08:00:00+0000 18661 53.754283 -1.526638 LanshawZone 0 250 0 gb 202 Yorkshire & The Humber
4 2013-06-15T08:00:00+0000 17424 55.949285 -3.144777 NoDogsZone 0 250 5 gb 200 Scotland

>>> df[df["name_region"].str.contains("Uppsala")].head()
dateCreated id latitude longitude name pointsPerHour takeoverPoints totalTakeovers country id_region name_region
28 2013-07-16T18:53:48+0000 20828 59.793476 17.775389 MoraStenRast 5 125 536 se 142 Uppsala
59 2013-02-08T21:42:53+0000 14797 59.570418 17.482116 BålWoods 3 155 555 se 142 Uppsala
102 2014-06-19T12:00:00+0000 31843 59.617637 17.077094 EnaAlle 5 125 168 se 142 Uppsala
328 2012-09-24T20:08:22+0000 11461 59.634438 17.066398 BluePark 6 110 1968 se 142 Uppsala
330 2014-08-28T20:00:00+0000 33695 59.867027 17.710792 EnbackensBro 4 140 59 se 142 Uppsala

(破解方法是 df["region"].apply(str).str.contains("Uppsala"),但我认为最好在一开始就清理数据.)

关于python - 使用字符串搜索 Pandas 系列会产生 KeyError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26005424/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com