gpt4 book ai didi

python Pandas : A value is trying to be set on a copy of a slice from a DataFrame

转载 作者:太空宇宙 更新时间:2023-11-03 14:32:07 25 4
gpt4 key购买 nike

请您告知以下几行如何应该根据重写 http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

  1. df.drop('PACKETS', axis=1, inplace=True)

产生

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
df.drop('PACKETS', axis=1, inplace=True)
/home/app/ip-spotlight/code/app/ipacc/plugin/ix.py:74: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
  • df.replace(numpy.nan, "", inplace=True)
  • 产生

    See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
    df.replace(numpy.nan, "", inplace=True)
    /home/app/ip-spotlight/code/app/ipacc/plugin/ix.py:68: SettingWithCopyWarning:
    A value is trying to be set on a copy of a slice from a DataFrame

    另一方面,下面是如何根据上述原理重写的示例

    df.loc[:, ('SRC_PREFIX')]   = df[ ['SRC_NET', 'SRC_MASK'] ].apply(lambda x: "/".join(x), axis=1)

    但是我不知道如何重写案例 1 和 2?

    编辑:到目前为止的代码看起来像这样(df是感兴趣的数据帧)。所以最初是某种类型的类型转换:

    df = pandas.DataFrame(data['payload'], columns=sorted(data['header'], key=data['header'].get))
    df = df.astype({
    'SRC_AS' : "object",
    'DST_AS' : "object",
    'COMMS' : "object",
    'SRC_COMMS' : "object",
    'AS_PATH' : "object",
    'SRC_AS_PATH' : "object",
    'PREF' : "object",
    'SRC_PREF' : "object",
    'MED' : "object",
    'SRC_MED' : "object",
    'PEER_SRC_AS' : "object",
    'PEER_DST_AS' : "object",
    'PEER_SRC_IP' : "object",
    'PEER_DST_IP' : "object",
    'IN_IFACE' : "object",
    'OUT_IFACE' : "object",
    'SRC_NET' : "object",
    'DST_NET' : "object",
    'SRC_MASK' : "object",
    'DST_MASK' : "object",
    'PROTOCOL' : "object",
    'TOS' : "object",
    'SAMPLING_RATE' : "uint64",
    'EXPORT_PROTO_VERSION' : "object",
    'PACKETS' : "object",
    'BYTES' : "uint64",
    })

    然后调用模块的calculate函数:

    mod.calculate(data['identifier'], data['timestamp'], df)

    calculate 函数的定义如下:

    def calculate(identifier, timestamp, df):
    try:
    # Filter based on AORTA IX.
    lut_ipaddr = lookup_ipaddr()
    df = df[ (df.PEER_SRC_IP.isin( lut_ipaddr )) ]
    if df.shape[0] > 0:
    logger.info('analyzing message `{}`'.format(identifier))
    # Preparing for input.
    df.replace("", numpy.nan, inplace=True)
    # Data wrangling. Calculate traffic rate. Reduce.
    df.loc[:, ('BPS')] = 8*df['BYTES']*df['SAMPLING_RATE']/300
    df.drop(columns=['SAMPLING_RATE', 'EXPORT_PROTO_VERSION', 'PACKETS', 'BYTES'], inplace=True)
    # Data wrangling. Formulate prefixes using CIDR notation. Reduce.
    df.loc[:, ('SRC_PREFIX')] = df[ ['SRC_NET', 'SRC_MASK'] ].apply(lambda x: "/".join(x), axis=1)
    df.loc[:, ('DST_PREFIX')] = df[ ['DST_NET', 'DST_MASK'] ].apply(lambda x: "/".join(x), axis=1)
    df.drop(columns=['SRC_NET', 'SRC_MASK', 'DST_NET' ,'DST_MASK'], inplace=True)
    # Populate using lookup tables.
    df.loc[:, ('NETELEMENT')] = df['PEER_SRC_IP'].apply(lookup_netelement)
    df.loc[:, ('IN_IFNAME')] = df.apply(lambda x: lookup_iface(x['NETELEMENT'], x['IN_IFACE']), axis=1)
    df.loc[:, ('OUT_IFNAME')] = df.apply(lambda x: lookup_iface(x['NETELEMENT'], x['OUT_IFACE']), axis=1)
    # df.loc[:, ('SRC_ASNAME')] = df.apply(lambda x: lookup_asn(x['SRC_AS']), axis=1)
    # Add a timestamp.
    df.loc[:, ('METERED_ON')] = arrow.get(timestamp, "YYYYMMDDHHmm").format("YYYY-MM-DD HH:mm:ss")
    # Preparing for input.
    df.replace(numpy.nan, "", inplace=True)
    # Finalize !
    return identifier, timestamp, df.to_dict(orient="records")
    else:
    logger.info('going through message `{}` no IX bgp/netflow data were found'.format(identifier))
    except Exception as e:
    logger.error('processing message `{}` at `{}` caused `{}`'.format(identifier,timestamp,repr(e)), exc_info=True)
    return identifier, timestamp, None

    最佳答案

    好的。我真的不知道 Pandas 背后到底发生了什么。但我仍然尝试举出一些最小的示例来向您展示问题可能出在哪里以及您可以采取哪些措施。首先,创建数据框:

    import numpy as np
    import pandas as pd
    df = pd.DataFrame(dict(x=[0, 1, 2],
    y=[0, 0, 5]))

    然后,当您将数据帧传递给函数时,我将执行相同的操作,但对于两个几乎相同的函数:

    def func(dfx):
    # Analog of your df = df[df.PEER_SRC_IP.isin(lut_ipaddr)]
    dfx = dfx[dfx['x'] > 1.5]
    # Analog of your df.replace("", numpy.nan, inplace=True)
    dfx.replace(5, np.nan, inplace=True)
    def func_with_copy(dfx):
    dfx = dfx[dfx['x'] > 1.5].copy() # explicitly making a copy
    dfx.replace(5, np.nan, inplace=True)

    现在让我们称它们为初始 df:

    func_with_copy(df)
    print(df)

    给出

       x  y
    0 0 0
    1 1 0
    2 2 5

    并且没有警告。并称之为:

    func(df)
    print(df)

    给出相同的输出:

       x  y
    0 0 0
    1 1 0
    2 2 5

    但有警告:

    /usr/local/lib/python3.6/site-packages/ipykernel_launcher.py:6: SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame

    See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

    所以这看起来像是“误报”。这是关于误报的一个很好的评论:link

    这里奇怪的是,如果您对数据帧执行完全相同的操作但没有将其传递给函数,那么您将不会看到此警告。 ́\_(ツ)_/́

    我的建议是使用.copy()

    关于 python Pandas : A value is trying to be set on a copy of a slice from a DataFrame,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47207713/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com