python - 特定调用类别的第一次调用与同一调用类别的后续调用之间的调查分数差异-6ren

python - 特定调用类别的第一次调用与同一调用类别的后续调用之间的调查分数差异

转载作者：太空宇宙更新时间：2023-11-03 20:46:01

我有一个包含调用中心数据的 pandas 数据框。数据框如下所示:

    member_id  survey_score  call_reason  call_direction      time_stamp
0     bob13         0          returns       inbound      2019-03-18 10:12:00
1     ub40          5         complaint      inbound      2019-03-19 11:12:00
2     bob13         7          returns       outbound     2019-03-19 09:15:00
3     todd100       3         order_error    inbound      2019-03-20 10:15:00
4     ub40          2         complaint      inbound      2019-03-21 12:11:00
5     todd100       7         order_error    outbound     2019-03-22 08:10:00
6     ub40          1         complaint      outbound     2019-03-22 11:09:00
7     ron34         6         exchange       inbound      2019-03-22 13:09:00
8     ron34         7         returns         inbound      2019-03-24 15:03:00

我正在寻找的输出如下:

    member_id    call_reason     score_differential          
0     bob13       returns               7
1     ub40       complaint             -1
2     todd100    order_error            4

所以基本上，我希望获得成员(member)的第一个呼入电话调查分数与同一成员(member)的下一个呼出电话调查分数之间的差异，并且前提是调用原因也相同。

作为一名小企业主，我正在尝试自己为公司做数据科学方面的工作，以节省一些钱。不幸的是，我在这方面完全是菜鸟，并且在这方面遇到了很大的困难。任何帮助将不胜感激!

注意:我通过 anaconda 在本地计算机上使用 jupyter 笔记本和 pandas。

请帮助我以更快、更简单、更合乎逻辑的方式完成此操作。

我已经尝试了很多方法来获得正确的输出，但我仍然遇到很大的困难。我也觉得我把事情过于复杂化了。首先我得到调用命令。然后，我为第一个分数入站调用分数和分数差异创建一些列。然后我得到了要迭代的所有唯一成员 ID 的列表，最后我用一堆逻辑创建了一个巨大的循环，但我在其中迷失了方向。

此外，在这段代码的第一次迭代中，我没有考虑调用方向。此外，我还得到了具有相同调用原因的成员(member)的所有后续调用的平均值，然后得到了该调用与第一次调用之间的差异。我不想再这样做了。

df['call_order'] = df_repeat.groupby('member_id')['timestamp'].rank(ascending=True, method = 'dense')

df["first_call_survey_score"] = ""
df["first_call_survey_score"] = np.nan
df["score_differential"] = ""
df["score_differential"] = np.nan

member_list = df['member_id'].unique()

unscorable = 0
for member in member_list:
    try:
        count = 2
        temp = df.loc[df['member_id'] == member]
        temp = temp.drop_duplicates(subset='call_order', keep="first")
        num_calls = temp['member_id'].count()
        first_call = temp.query("call_order == 1")
        first_survey_score = first_call['survey_score'].values[0]
        reason = first_call['call_reason'].values[0]
        sumscore = 0
        legit_call_count = 0
        while count <= num_calls:
                next_call = temp.query("call_order == @count")
                if reason == next_call['call_reason'].values[0]:
                    sumscore = sumscore + next_call['survey_score'].values[0]
                    count = count + 1
                    legit_call_count = legit_call_count + 1 
                elif reason != next_call['call_reason'].values[0] and count == num_calls:
                    count = 20
                elif reason != next_call['call_reason'].values[0]:
                    count = count + 1
                    next_call = temp.query("call_order == @count")
                    reason = next_call['call_reason'].values[0]
                    first_survey_score = next_call['survey_score'].values[0]
                else: count = count + 1

        if legit_call_count == 1:
            df.loc[((df_repeat['member_id'] == member)),['score_differential']] = sumscore / legit_call_count - first_survey_score
        elif count == 20:unscorable = unscorable + 1
        else: 
            df.loc[((df['member_id'] == member)),['score_differential']] = sumscore / legit_call_count - first_survey_score
    except Exception as exception:
            unscorable = unscorable + 1

print(unscorable, "Callers could not be scored")

最佳答案

这是一种方法，其中拨出调用由成员/原因指定一个唯一的 ID，并且该 ID 会回填到拨入调用中。然后，给定(成员、原因、Id)的最后一个传入调用与相同(成员、原因、Id)的传出调用配对，并计算差值。注意:我为用户 bob13 添加了第二个调用序列，以表明它可以处理同一用户的多个调用。

txt = """\
   member_id  survey_score  call_reason  call_direction      time_stamp
     bob13         0          returns       inbound      2019-03-18T10:12:00
     ub40          5         complaint      inbound      2019-03-19T11:12:00
     bob13         7          returns       outbound     2019-03-19T09:15:00
     todd100       3         order_error    inbound      2019-03-20T10:15:00
     ub40          2         complaint      inbound      2019-03-21T12:11:00
     todd100       7         order_error    outbound     2019-03-22T08:10:00
     ub40          1         complaint      outbound     2019-03-22T11:09:00
     ron34         6         exchange       inbound      2019-03-22T13:09:00
     ron34         7         returns         inbound      2019-03-24T15:03:00
     bob13         2          returns       inbound      2019-03-25T10:12:00
     bob13         3          returns       outbound     2019-03-27T09:15:00
"""
df = pd.read_csv(io.StringIO(txt), delim_whitespace=1, index_col=False)

grp = df.query('call_direction=="outbound"').\
    groupby(['member_id', 'call_reason'])
df['OutId'] = grp.time_stamp.transform(lambda x: x.rank())
print()
print(df)

grp = df.groupby(['member_id', 'call_reason'])
df['Id'] = grp.OutId.transform(lambda x: x.bfill())
print()
print(df)

inbnd_score = df.query('call_direction=="inbound"').\
    groupby(['member_id', 'call_reason', 'Id']).survey_score.last()
outbnd_score = df.query('call_direction=="outbound"').\
    groupby(['member_id', 'call_reason', 'Id']).survey_score.last()

ddf = pd.concat([inbnd_score, outbnd_score], axis=1,
                keys=['inbnd', 'outbnd'])
ddf['score_differential'] = ddf.outbnd - ddf.inbnd
print()
print(ddf)

输出:

   member_id  survey_score  call_reason call_direction           time_stamp  OutId
0      bob13             0      returns        inbound  2019-03-18T10:12:00    NaN
1       ub40             5    complaint        inbound  2019-03-19T11:12:00    NaN
2      bob13             7      returns       outbound  2019-03-19T09:15:00    1.0
3    todd100             3  order_error        inbound  2019-03-20T10:15:00    NaN
4       ub40             2    complaint        inbound  2019-03-21T12:11:00    NaN
5    todd100             7  order_error       outbound  2019-03-22T08:10:00    1.0
6       ub40             1    complaint       outbound  2019-03-22T11:09:00    1.0
7      ron34             6     exchange        inbound  2019-03-22T13:09:00    NaN
8      ron34             7      returns        inbound  2019-03-24T15:03:00    NaN
9      bob13             2      returns        inbound  2019-03-25T10:12:00    NaN
10     bob13             3      returns       outbound  2019-03-27T09:15:00    2.0

   member_id  survey_score  call_reason call_direction           time_stamp  OutId   Id
0      bob13             0      returns        inbound  2019-03-18T10:12:00    NaN  1.0
1       ub40             5    complaint        inbound  2019-03-19T11:12:00    NaN  1.0
2      bob13             7      returns       outbound  2019-03-19T09:15:00    1.0  1.0
3    todd100             3  order_error        inbound  2019-03-20T10:15:00    NaN  1.0
4       ub40             2    complaint        inbound  2019-03-21T12:11:00    NaN  1.0
5    todd100             7  order_error       outbound  2019-03-22T08:10:00    1.0  1.0
6       ub40             1    complaint       outbound  2019-03-22T11:09:00    1.0  1.0
7      ron34             6     exchange        inbound  2019-03-22T13:09:00    NaN  NaN
8      ron34             7      returns        inbound  2019-03-24T15:03:00    NaN  NaN
9      bob13             2      returns        inbound  2019-03-25T10:12:00    NaN  2.0
10     bob13             3      returns       outbound  2019-03-27T09:15:00    2.0  2.0

                           inbnd  outbnd  score_differential
member_id call_reason Id
bob13     returns     1.0      0       7                   7
                      2.0      2       3                   1
todd100   order_error 1.0      3       7                   4
ub40      complaint   1.0      2       1                  -1

关于python - 特定调用类别的第一次调用与同一调用类别的后续调用之间的调查分数差异，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56605081/

文章推荐： c# - 如何解析不是 100% 有效的 XHTML 文件？

文章推荐： opencv - 针孔相机模型坐标系

文章推荐： python - 解析 .dms 文件中存在的深度嵌套的 JSON 数据

文章推荐： c# - 从 C# 调用 .DLL 的奇怪问题

python - Python 中的集群或合并集群以减少组数 (Python)
我正在处理一组标记为 160 个组的 173k 点。我想通过合并最接近的(到 9 或 10 个组)来减少组/集群的数量。我搜索过 sklearn 或类似的库，但没有成功。我猜它只是通过 knn 聚类
python - python 列表的子集基于同一列表的元素组，pythonically
我有一个扁平数字列表，这些数字逻辑上以 3 为一组，其中每个三元组是 (number, __ignored, flag[0 or 1])，例如: [7,56,1, 8,0,0, 2,0,0, 6,1,
python - 激活 Python 虚拟环境并在另一个 Python 脚本中调用 Python 脚本
我正在使用 pipenv 来管理我的包。我想编写一个 python 脚本来调用另一个使用不同虚拟环境(VE)的 python 脚本。如何运行使用 VE1 的 python 脚本 1 并调用另一个 p
python - 在焕然一新的 Python 环境中以编程方式从 Python 内部执行 Python 文件
假设我有一个文件 script.py 位于 path = "foo/bar/script.py"。我正在寻找一种在 Python 中通过函数 execute_script() 从我的主要 Python
python - 从 python 脚本但在 python 脚本之外运行 python 脚本
这听起来像是谜语或笑话，但实际上我还没有找到这个问题的答案。问题到底是什么？我想运行 2 个脚本。在第一个脚本中，我调用另一个脚本，但我希望它们继续并行，而不是在两个单独的线程中。主要是我不希望第
python - 使用不同的 python 从 python 运行 python 脚本
我有一个带有 python 2.5.5 的软件。我想发送一个命令，该命令将在 python 2.7.5 中启动一个脚本，然后继续执行该脚本。我试过用 #!python2.7.5 和http://re
python - 为什么从 Python 命令行调用 Python 时 Python 无法找到并运行我的脚本？
我在 python 命令行(使用 python 2.7)中，并尝试运行 Python 脚本。我的操作系统是 Windows 7。我已将我的目录设置为包含我所有脚本的文件夹，使用: os.chdir("
python - 使用动态版本的 Python 执行嵌入的 Python 代码时出现致命的 Python 错误
剧透:部分解决(见最后)。以下是使用 Python 嵌入的代码示例: #include int main(int argc, char** argv) { Py_SetPythonHome
python - python 中识别 python 数组或列表中最大累积差异的最快方法是什么？
假设我有以下列表，对应于及时的股票价格: prices = [1, 3, 7, 10, 9, 8, 5, 3, 6, 8, 12, 9, 6, 10, 13, 8, 4, 11] 我想确定以下总体上最
python - (Python) 通过单选按钮 python 更新背景
所以我试图在选择某个单选按钮时更改此框架的背景。我的框架位于一个类中，并且单选按钮的功能位于该类之外。 (这样我就可以在所有其他框架上调用它们。) 问题是每当我选择单选按钮时都会出现以下错误: co
python - python 中的字符串与正则表达式比较在 python 中失败
我正在尝试将字符串与 python 中的正则表达式进行比较，如下所示， #!/usr/bin/env python3 import re str1 = "Expecting property name
python - python 如何加载Boost.Python 库？
考虑以下原型(prototype) Boost.Python 模块，该模块从单独的 C++ 头文件中引入类“D”。 /* file: a/b.cpp */ BOOST_PYTHON_MODULE(c)
python - python 检查模块 python 的问题
如何编写一个程序来“识别函数调用的行号？” python 检查模块提供了定位行号的选项，但是， def di(): return inspect.currentframe().f_back.f_l
python - 系统 python 与用户 python
我已经使用 macports 安装了 Python 2.7，并且由于我的 $PATH 变量，这就是我输入 $ python 时得到的变量。然而，virtualenv 默认使用 Python 2.6，除
python - [Python] : Python re. 长字符串行的搜索速度优化
我只想问如何加快 python 上的 re.search 速度。我有一个很长的字符串行，长度为 176861(即带有一些符号的字母数字字符)，我使用此函数测试了该行以进行研究: def getExe
python - 编辑字符串 python 正则表达式 python
list1= [u'%app%%General%%Council%', u'%people%', u'%people%%Regional%%Council%%Mandate%', u'%ppp%%Ge
python - Python 映射中的副作用(Python "do" block )
这个问题在这里已经有了答案: Is it Pythonic to use list comprehensions for just side effects? (7 个答案) 关闭 4 个月前。告
python - 使用其值逻辑组合两个 python 列表 - Python
我想用 Python 将两个列表组合成一个列表，方法如下: a = [1,1,1,2,2,2,3,3,3,3] b= ["Sun", "is", "bright", "June","and" ,"Ju
python - Boost.Python python 链接错误
我正在运行带有最新 Boost 发行版 (1.55.0) 的 Mac OS X 10.8.4 (Darwin 12.4.0)。我正在按照说明 here构建包含在我的发行版中的教程 Boost-Pyth
python - 在 Python 中仅使用内置库制作一个基本的网络抓取工具 - Python
学习 Python，我正在尝试制作一个没有任何第 3 方库的网络抓取工具，这样过程对我来说并没有简化，而且我知道我在做什么。我浏览了一些在线资源，但所有这些都让我对某些事情感到困惑。 html 看起来

太空宇宙

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 特定调用类别的第一次调用与同一调用类别的后续调用之间的调查分数差异