python - 长度必须匹配才能比较( Pandas 根据两个标准进行选择)-6ren

python - 长度必须匹配才能比较( Pandas 根据两个标准进行选择)

转载作者：行者123 更新时间：2023-12-01 08:03:18

我正在为每个用户生成值，如下所示:

loDf = locDfs[user] # locDfs is a copy of locationDf elsewhere in the code... sorry for all the variable names.
loDf.reset_index(inplace=True)
loDf = pd.crosstab([loDf.date, loDf.uid], loDf.location_id)
loDf.reset_index(inplace=True)

loDf.set_index('date', inplace=True)
loDf.drop('uid', axis=1, inplace=True)

# join the location crosstab columns with the app crosstab columns per user
userLocAppDfs[user] = userLocAppDfs[user].join(loDf, how='outer')
# convert from just "1" at each location change event followed by zeros, to "1" continuing until next location change
userLocAppDfs[user] = userLocAppDfs[user].resample('15T').agg('max')
userLocAppDfs[user]['uid'].fillna(user, inplace=True)

这会获取位置数据并将 location_id 转换为列，并将其与时间序列中的其他数据组合。

这基本上涵盖了数据的 reshape 。然后我需要标准化，为此，我需要查看每个列的值:

for loc in locationDf[(locationDf['location_id'] in loDf.columns.values) & (locationDf['uid'] == user)].location_id.unique():

完整的函数如下:

def normalize(inputMetricDf, inputLocationDf):
    '''
    normalize, resample, and combine data into a single data source
    '''
    metricDf = inputMetricDf.copy()
    locationDf = inputLocationDf.copy()

    appDf = metricDf[['date', 'uid', 'app_id', 'metric']].copy()

    locDf = locationDf[['date', 'uid', 'location_id']]
    locDf.set_index('date', inplace=True)

    # convert location data to "15 minute interval" rows
    locDfs = {}
    for user, user_loc_dc in locDf.groupby('uid'):
        locDfs[user] = user_loc_dc.resample('15T').agg('max').bfill()

    aDf = appDf.copy()
    aDf.set_index('date', inplace=True)

    userLocAppDfs = {}
    user = ''
    for uid, a2_df in aDf.groupby('uid'):
        user = uid
        # per user, convert app data to 15m interval
        userDf = a2_df.resample('15T').agg('max')

        # assign metric for each app to an app column for each app, per user
        userDf.reset_index(inplace=True)
        userDf = pd.crosstab(index=userDf['date'], columns=userDf['app_id'],
                             values=userDf['metric'], aggfunc=np.mean).fillna(np.nan, downcast='infer')

        userDf['uid'] = user

        userDf.reset_index(inplace=True)
        userDf.set_index('date', inplace=True)

        # reapply 15m intervals now that we have new data per app
        userLocAppDfs[user] = userDf.resample('15T').agg('max')

        # assign location data to location columns per location, creates a "1" at the 15m interval of the location change event in the location column created
        loDf = locDfs[user]
        loDf.reset_index(inplace=True)
        loDf = pd.crosstab([loDf.date, loDf.uid], loDf.location_id)
        loDf.reset_index(inplace=True)

        loDf.set_index('date', inplace=True)
        loDf.drop('uid', axis=1, inplace=True)

        # join the location crosstab columns with the app crosstab columns per user
        userLocAppDfs[user] = userLocAppDfs[user].join(loDf, how='outer')
        # convert from just "1" at each location change event followed by zeros, to "1" continuing until next location change
        userLocAppDfs[user] = userLocAppDfs[user].resample('15T').agg('max')
        userLocAppDfs[user]['uid'].fillna(user, inplace=True)

        for loc in locationDf[(locationDf['location_id'] in loDf.columns.values) & (locationDf['uid'] == user)].location_id.unique():
            # fill location NaNs
            userLocAppDfs[user][loc] = userLocAppDfs[user][loc].replace(
                np.nan, 0)

        # fill app NaNs
        for app in a2_df['app_id'].unique():
            userLocAppDfs[user][app].interpolate(
                method='linear', limit_area='inside', inplace=True)
            userLocAppDfs[user][app].fillna(value=0, inplace=True)

    df = userLocAppDfs[user].copy()

    # ensure actual normality
    alpha = 0.05
    for app in aDf['app_id'].unique():
        _, p = normaltest(userLocAppDfs[user][app])
        if(p > alpha):
            raise DataNotNormal(args=(user, app))

    # for loc in userLocAppDfs[user]:
        # could also test location data

    return df

但这会产生错误:

  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 346, in run_http_function
    result = _function_handler.invoke_user_function(flask.request)
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 223, in invoke_user_function
    loop.run_until_complete(future)
  File "/opt/python3.7/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "/user_code/main.py", line 31, in default_model
    train, endog, exog, _, _, rawDf = preprocess(ledger, apps)
  File "/user_code/Wrangling.py", line 67, in preprocess
    rawDf = normalize(appDf, locDf)
  File "/user_code/Wrangling.py", line 185, in normalize
    for loc in locationDf[(locationDf['location_id'] in loDf.columns.values) & (locationDf['uid'] == user)].location_id.unique():
  File "/env/local/lib/python3.7/site-packages/pandas/core/ops.py", line 1745, in wrapper
    raise ValueError('Lengths must match to compare')
ValueError: Lengths must match to compare

在我注意到我可能会因为 reshape 而丢失locationsDf中的位置之前，我只是在做:

    for loc in locationDf[locationDf['uid'] == user].location_id.unique():

并且这对所有其他实例都有效。但是，如果在同一个 15t 时间段中有两个位置，并且其中一个仅出现在那里，但由于 15t 窗口而被删除，那么它会给我一个错误。所以我需要另一个条件。

locationDf['location_id'] 只是一个字符串，就像交叉表列名一样。

为什么这会引发错误？

尝试回答的错误:

    for loc in locationDf[(locationDf['location_id'].isin(loDf.columns.values)) & (locationDf['uid'].isin([user])), 'location_id'].unique():
  File "/env/local/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/env/local/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2657, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 110, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(0       True
1       True
2       True
3      False
4       True
5       True
6      False
7       True
8       True
9      False
10     False
11     False
12     False
13     False
14     False
15     False
16     False
17     False
18     False
19      True
20      True
21      True
22      True
23      True
24      True
25      True
26     False
27      True
28      True
29     False
       ...  
210    False
211    False
212    False
213    False
214    False
215    False
216    False
217    False
218     True
219     True
220    False
221    False
222    False
223    False
224    False
225    False
226     True
227    False
228     True
229    False
230    False
231     True
232    False
233     True
234    False
235    False
236    False
237     True
238    False
239    False
Length: 240, dtype: bool, 'location_id')' is an invalid key

最佳答案

将条件更改为(使用isin)

locationDf.loc[(locationDf['location_id'].isin(loDf.columns.values)) 
           & (locationDf['uid'].isin(user)),'location_id'].unique()

更新

con1 = (locationDf['location_id'].isin(loDf.columns.values)
con2 = (locationDf['uid'].isin(pd.Series(user))

locationDf.loc[con1&con2,'location_id'].unique()

关于python - 长度必须匹配才能比较( Pandas 根据两个标准进行选择)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55643042/

文章推荐： python - Paramiko exec_command 的实时输出

文章推荐： php - PHP 中递归函数的替代方案(避免 100 次递归的限制)

文章推荐： php - Jquery使用php var并暂停

文章推荐： jsf - 什么时候应该同步托管bean的方法？

java - 在finally{}中捕获异常？必须？
我感到困惑...... 我在 .jsp 中编写了一个小例程。最后需要关闭ResultSet、Statement和Connection。我也在finally { }中编写了结束代码，但是当页面运行时，它
c - 必须(应该)避免使用标准库中的哪些函数？
我在 Stack Overflow 上读到一些 C 函数是“过时的”或“应该避免”。你能给我一些这种功能的例子以及原因吗？这些功能有哪些替代方案？我们可以安全地使用它们 - 有什么好的做法吗？最
java - x 必须 < bitmap.width()
我正在构建一个应用程序，它可以拍照、显示图片，然后一旦被点击，就会在点击的任何地方返回图片的颜色。它在崩溃之前到达了水龙头。我得到 x 必须是 < bitmap.width() 的错误就我的理解而
elasticsearch - Elastic Search 必须 + 至少一个过滤器查询中的 SHOULD
我试图根据几个因素向用户提出建议: •建议只能是同一所大学的学生•建议必须至少匹配一个其他字段我以为我有它，但问题是这个查询将返回同一所学校的所有学生，而不管其他情况: PUT /user/.per
python - 我*必须*在我的数据库中存储第三方凭证。最好的办法？
我的应用程序必须从第三方读取 SSL 网址。我如何最好地将第三方凭证存储在我自己的数据库中，以保护第三方凭证不被泄露？兼顾绝对的安全性和实用性。对凭据进行单向哈希处理没有用，因为我必须将凭据恢复为明文
ruby-on-rails - 必须 to_json 以字符串形式返回一个 mongoid
在我的 Rails API 中，我希望 Mongo 对象作为 JSON 字符串返回，Mongo UID 作为“id”属性而不是“_id”对象。我希望我的 API 返回以下 JSON: { "
c - 服务器多线程，协议(protocol)必须？和更多
假设应用层协议(protocol)是通过UDP实现的。客户端需要超时，因此服务器需要保留与其通信的每个客户端的状态。还假设使用了select。实现多线程服务器总是最好的吗？我认为链接列表也会做同样
java - 当 GC 不(必须)运行并且程序完成执行时会发生什么？
考虑一个非常短的程序，我在其中分配了一点内存。我被告知，GC 在程序分配大量内存并且分配达到限制的情况下运行。我不知道这个限制到底是多少，但我认为它必须足够高，这样 GC 才不会频繁运行并减慢程序的
iphone - 究竟什么时候*必须*应用程序包含 Reachability 类来测试网络可达性？
根据 Cocoa with Love当应用程序需要 WiFi(而不是蜂窝网络)时需要可达性，例如如果应用加载大量视频并且不适合在 3G 网络上使用。我的应用程序使用互联网，无论是 WiFi 还是 3
javascript - jQuery 悬停缩略图，但需要时间更新主镜头，必须 catch
我正在寻找更好的解决方案来解决我面临的这个问题。如果您将鼠标悬停在缩略图上，它会淡出较大的镜头并淡入新的镜头，这很好，但是当转到目标缩略图并且您的鼠标再悬停一些时，它会更改为您的鼠标经过并拍摄的其他
windows - 高完整性 token 是否*必须*启用管理员组？
启用 UAC 并使用管理帐户登录后，您将获得两个 token : 提升的 token ；这已启用 Administrators 组，具有高完整性(即强制性完整性标签 SID 为 S-1-16-1228
reactjs - 我是否*必须*展平 React.JS 中的所有分层组件声明？
我想知道在 React 中创建动态选择组件的规范方法是什么。我是否必须创建一个单独的组件来根据下面的代码返回选项，以便能够通过每个条目的 props 自定义值，然后将它们包含到单独的选择组件中？ p>
Datagrid 分页 : Invalid CurrentPageIndex value. 必须 >= 0
我有一个启用了分页的数据网格。我根据过滤条件在数据网格中显示结果。我已经过滤了数据，现在有 2 页。当我转到第二页时。我正在再次执行搜索功能以缩小结果范围。然后我收到类似“无效的 CurrentPag
postgresql - Postgres-必须 to_timestamp() 忽略/不读取日期/时间字符串中间的特定字符
我有原始文本列，其值类似于“2012-07-26T10:33:34”和“2012-07-26T10:56:16”。在使用 Joda-Time 的 Java 中，我可以通过调用轻松地将其转换为日期/从
html - 可以使 div 到达顶部的某个点吗？必须 react 灵敏
您好，我被分配了一项棘手的任务。我需要让一个方形 div 到达顶部的一个点。基本上它看起来像一个正方形 div，顶部有一个宽三 Angular 形。请参阅下面的屏幕截图。顶部的深蓝色只是堆叠在白色 d
android - 为什么我们(必须)使用不同的启动器图标(xhdpi、hdpi 等)
我想知道，为什么我们在 android 中使用不同的启动器图标(大小)。目前您“必须”将图标大小调整为: LDPI - 36 x 36 MDPI - 48 x 48 HDPI - 72 x 72 XH
c++ - 必须 "ask which exact type an object has"是否总是表示设计不好？
在 SO 的几个地方，声称必须知道对象的确切类型并基于此做出决定(以 if-then-else 方式)指向一个设计缺陷，例如here . 我想知道是否总是如此。在当前的一个小型教育项目(我正在使用它来
c++ - 为什么(必须)从 std::iterator 继承？
据我了解，迭代器是一种为客户端提供接口(interface)以观察/迭代/传递自定义集合等内容的机制，而不破坏信息隐藏原则。 STL 容器有自己的迭代器，所以我们可以毫无问题地对它们使用 for (
html - go - 调用 "html/template"时没有足够的参数。必须
我在 Golang 中编写了一个包装函数，用于从多个文件中渲染模板，如下所示: func RenderTemplate(w http.ResponseWriter, data interface{},
c++ - 必须 size() == end() - begin()？ Actor 阵容呢？
据我了解，size_type 和 difference_type 的目的不仅仅是符号——它也是为了解决例如分段架构等，它们可能具有不同的大小。在这种情况下，如果我有一个带有随机访问迭代器的容器，那么

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 长度必须匹配才能比较( Pandas 根据两个标准进行选择)

尝试回答的错误: