python - 查找 Pandas 系列中的关键字子集 (Python)-6ren

python - 查找 Pandas 系列中的关键字子集 (Python)

转载作者：太空宇宙更新时间：2023-11-03 14:05:14

25

4

我正在使用看起来非常像这样的系列:

l0 = ['smartphone', 'battery', 'case', 'grey', '10071852']
l1 = ['phone', 'new', 'charging', 'case', 'white']
l2 = ['tablet', 'phone', 'pin', 'adapter', 'ex766']
l3 = ['phone', 'silicon', 'case', 'brown']

mySeries = pd.Series([l0,l1,l2,l3])

print(mySeries)

0    [smartphone, battery, case, grey, 10071852]
1            [phone, new, charging, case, white]
2           [tablet, phone, pin, adapter, ex766]
3                  [phone, silicon, case, brown]

我正在尝试搜索该系列的每一行(每个列表)中可能包含的关键字和关键字集。更具体地说，假设我想查找系列中的一行是否包含以下关键字:

simple_keywords = {'case', 'adapter'}

还要查找该系列是否包含以下关键字对:

double_keywords = {'battery case', 'charging case'}

寻找 simple_keywords，似乎很容易。但是，我也想查找这些对，并确保如果有像“电池盒”这样的对，我希望它返回关键字对，而不仅仅是“case”。

此外，我有一个如下所示的数据框:

d = {'Date': ['03/08/2014', '04/08/2014', '05/08/2014', '06/08/2014'], 'Product': ['none', 'none','none','none'],'Frequency': [5, 10, 1, 2]}
myDF = pd.DataFrame(data=d)

print(myDF)

         Date  Frequency Product
0  03/08/2014          5    none
1  04/08/2014         10    none
2  05/08/2014          1    none
3  06/08/2014          2    none

我的最终目标是在此数据框中(在产品列中)写入我在系列中确定的相应关键字(或关键字对)。系列的每一行对应于数据框中完全相同的行，这意味着顺序非常重要。我想查看2014年8月3日的产品“电池盒”的频率为5。

我尝试通过分隔关键字对来提出一些解决方案，但它似乎非常慢并且效率不高，因为我正在处理的系列中有超过 350,000 行(将其留了一夜而且还没有完成):

first_keywords = {'case', 'adapter'}
second_keywords = {'battery', 'charging'}    

mySeries_range = len(mySeries)

for i in range(mySeries_range):
        for x, y in [(x, y) for x in first_keywords for y in second_keywords]:
            if x in mySeries[i] and y in mySeries[i]:
                myDF.Product[i] = y + ' ' + x
            elif x in mySeries[i] and y not in mySeries[i]:
                myDF.Product[i] = x

我希望获得的最终结果是:

         Date  Frequency        Product
0  03/08/2014          5   battery case
1  04/08/2014         10  charging case
2  05/08/2014          1        adapter
3  06/08/2014          2           case

如果有人能帮助我那就太好了。如果我的代码不太漂亮，请道歉...努力变得更好!

最佳答案

您可以通过以下方式从 mySeries 列表中的单词中生成任意数量的组合:

import itertools
df_comb = pd.concat([mySeries.apply(lambda x: [" ".join(l) 
                     for l in list(itertools.combinations(x,max_len))
                     ]).rename(max_len) 
                     for max_len in [1,2]],axis=1).astype(str)

这是结果:

>>> df_comb                                             1  \
0  [smartphone, battery, case, grey, 10071852]   
1          [phone, new, charging, case, white]   
2         [tablet, phone, pin, adapter, ex766]   
3                [phone, silicon, case, brown]   

                                                   2  
0  [smartphone battery, smartphone case, smartpho...  
1  [phone new, phone charging, phone case, phone ...  
2  [tablet phone, tablet pin, tablet adapter, tab...  
3  [phone silicon, phone case, phone brown, silic...

现在让我们将单词的字典做成一个列表，以便更容易迭代:

simple_keywords = ['case', 'adapter']
double_keywords = ['battery case', 'charging case']

然后你可以这样计算元素:

>>> pd.concat([df_comb.apply(lambda x: pd.Series(x).str.count(w),axis=0)[len(' '.split(w))].rename(w) 
for w in simple_keywords],axis=1)
   case  adapter
0     1        0
1     1        0
2     0        1
3     1        0

>>> pd.concat([df_comb.apply(lambda x: pd.Series(x).str.count(w),axis=0)[len(w.split(' '))].rename(w) for w in double_keywords],axis=1)

  battery case  charging case
0             1              0
1             0              1
2             0              0
3             0              0

或者我们可以这样迭代:

df_count = pd.DataFrame()
for list_of_keywords in [simple_keywords, double_keywords]:
    df_count_temp = pd.concat([df_comb.apply(lambda x: pd.Series(x).str.count(w),
                               axis=0)[len(w.split(' '))].rename(w) 
                               for w in list_of_keywords],axis=1)
    df_count = pd.concat([df_count, df_count_temp],axis=1)

计数将是:

>>> df_count

   case  adapter  battery case  charging case
0     1        0             1              0
1     1        0             0              1
2     0        1             0              0
3     1        0             0              0

您可以通过以下方式获得最终计数:

>>> df_count.sum(axis=0).to_frame()

               0
case           3
adapter        1
battery case   1
charging case  1

您可以创建一个函数来将其应用于每天的条目。

def my_func(mySeries, keywords =  [['case', 'adapter'] ,['battery case', 'charging case']]):
    import itertools
    keyword_lengths = [len(k[0].split(' ')) for k in keywords]
    df_comb = pd.concat([mySeries.apply(lambda x: [" ".join(l) 
                         for l in list(itertools.combinations(x,max_len))
                         ]).rename(max_len) 
                         for max_len in keyword_lengths],axis=1).astype(str)

    df_count = pd.DataFrame()
    for list_of_keywords in keywords:
        df_count_temp = pd.concat([df_comb.apply(lambda x:pd.Series(x).str.count(w),
                                   axis=0)[len(w.split(' '))].rename(w) 
                                   for w in list_of_keywords],axis=1)
        df_count = pd.concat([df_count, df_count_temp],axis=1)

    return df_count

想象这是您的 pd.Series:

>>> newSeries 
2014-03-08    [smartphone, battery, case, grey, 10071852]
2014-03-08            [phone, new, charging, case, white]
2014-03-08           [tablet, phone, pin, adapter, ex766]
2014-03-08                  [phone, silicon, case, brown]
2014-04-08            [phone, new, charging, case, white]
2014-04-08                           [tablet, phone, pin]
2014-04-08                               [phone, adapter]
dtype: object



>>> my_func(newSeries)

            case  adapter  battery case  charging case
2014-03-08     1        0             1              0
2014-03-08     1        0             0              1
2014-03-08     0        1             0              0
2014-03-08     1        0             0              0
2014-04-08     1        0             0              1
2014-04-08     0        0             0              0
2014-04-08     0        1             0              0

然后您可以使用按日期分组返回的数据框并计算元素数。这样您就可以按日期获得出场次数:

>>> df_appearances= my_func(newSeries).reset_index().groupby('index'
                     ).sum().T.unstack().reset_index()

>>> df_appearances.columns = ['Date', 'Product', 'Frequency']

>>> df_appearances

        Date        Product  Frequency
0 2014-03-08           case          3
1 2014-03-08        adapter          1
2 2014-03-08   battery case          1
3 2014-03-08  charging case          1
4 2014-04-08           case          1
5 2014-04-08        adapter          1
6 2014-04-08   battery case          0
7 2014-04-08  charging case          1

关于python - 查找 Pandas 系列中的关键字子集 (Python)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48947557/

25

4

0

文章推荐： python - 自动编码器 : accuracy & number of images

文章推荐： python - 将闭合曲线拟合到一组离散点并找到其周长

C++ 对象创建时没有使用 new 关键字，但在构造函数中使用了 new 关键字
如果我创建一个对象时没有使用 new 关键字，例如“Object s(someval)”，但该对象的构造函数使用了 new，当该对象超出范围时，是否会调用析构函数为其分配新的空间？我感觉好像是，但我不
Sql ONLY 关键字
在 SQL 语法中，我发现奇怪的规则表明 select * from ONLY (t1)是有效的 SQL。我的问题是:什么是 ONLY在这种情况下是什么意思？它在规范的“7.6 table ref
jQuery $(this) 关键字
为什么使用 $(this) 而不是重新选择类很重要？我在代码中使用了大量的动画和 CSS 编辑，并且我知道可以使用 $(this) 来简化它。最佳答案当您通过 jQuery 执行 DOM 查询(
Mysql IN 关键字
我正在尝试使用 IN 关键字编写查询。表A 属性标识、属性名称表B key 、属性标识、属性值根据提供的 key ，我想返回所有 attrName、attrVal 组合。结果将包含两个表中的列。
MySQL AS 关键字
这个问题在这里已经有了答案: Why would you use "AS" when aliasing a SQL table? (8 个答案) 关闭 9 年前。我不擅长写查询，但是从我开始使用
java this 关键字
我读过，在 Java 中，您不必将 this 关键字显式绑定(bind)到对象，它由解释器完成。它与 Javascript 相反，在 Javascript 中你总是必须知道 this 的值。但是 Ja
Swift "with"关键字
Swift 中“with”关键字的用途是什么？到目前为止，我发现如果您需要覆盖现有的全局函数，例如 toDebugString，可以使用该关键字。 // without "with" you
C# where 关键字
这个问题在这里已经有了答案: What does the keyword "where" in a class declaration do? (7 个答案) 关闭 9 年前。在下面的一段代码中(
Swift "where"关键字
免责声明:swift 菜鸟您好，我刚刚开始学习 Swift，正在学习 Swift 编程语言(Apple 在 WWDC 期间发布的书籍)，并且想知道“where”关键字是什么。它用于 let vege
去 "this"-关键字
深入研究文档后，我找不到以下问题的答案: 是否有任何理由反对使用 this 来引用当前对象，如下例所示？ type MyStruct struct { someField string } fun
PHP面向对象学习之parent::关键字
前言最近在做THINKPHP开发项目中，用到了 parent:: 关键字，实际上 parent::关键字是PHP中常要用到的一个功能，这不仅仅是在 THINKPHP 项目开发中，即使是一个小型
详谈signed 关键字
我们都知道且经常用到 unsigned 关键字，但有没有想过，与此对应的 signed 关键字有啥用？复制代码代码如下: int i = 0; signed
彻底理解Java中this 关键字
this关键字再java里面是一个我认为非常不好理解的概念，：）也许是太笨的原因 this 关键字的含义：可为以调用了其方法的那个对象生成相应的句柄。怎么理解这段话呢？ thinking i
初识 synchronized 关键字
一什么是 synchronized synchronized 关键字提供了一种锁机制，能够确保共享变量互斥访问，从而防止数据不一致问题的出现。 synchronized 关键字包括 monitor
深入解析 synchronized 关键字
最近看了几篇 synchronized 关键字的相关文章，收获很大，想着总结一下该关键字的相关内容。 1、synchronized 的作用原子性：所谓原子性就是指一个操作或者多个操作，要么全部执行并
JavaScript 方法和 this 关键字
在本教程中，您将借助示例了解 JavaScript 对象方法和 this 关键字。在 JavaScript 中，对象也可以包含函数。例如， // object containing meth
PHP "with"关键字 - "with"有什么作用？
有人可以解释一下 PHP“with”的作用吗？示例开始: 假设我有一个类: \App\fa_batch 这句话有什么区别: $w = (with (new \App\fa_batch))
typescript - 显式类型注释与 "as"关键字
这个问题在这里已经有了答案: What is the difference between using the colon and as syntax for declaring type? (2
tsql - IN 关键字与 OR 关键字
如果我在 WHERE 子句中使用以下任一项，是否会有很大不同: WHERE [Process Code] = 1 AND ([Material ID] = 'PLT' OR [Material ID]
sql - 关键字 'PROCEDURE'附近的语法不正确
This question is unlikely to help any future visitors; it is only relevant to a small geographic are

首页

博学

6Ren·AI

商城

python - 查找 Pandas 系列中的关键字子集 (Python)