gpt4 book ai didi

python - 在 Pandas 中使用正则表达式的多重模式

转载 作者:行者123 更新时间:2023-12-02 20:36:24 25 4
gpt4 key购买 nike

我是 Python 编程的初学者。我正在探索正则表达式。我正在尝试从描述列中提取一个词(数据库名称)。我无法提供多个正则表达式模式。

请看下面的描述和代码。

描述

Summary: AD1: Low free DATA space in database AD1ADS: 10.00% Date: 06/28/2017 Severity: Warning Res
Summary: Database SV1V1CH has used log space: 90.00% Date: 02/06/2017 Severity: Warning ResourceId: s
Summary: SAP SolMan Sys=SM1Tempdb,MO=AGEEPM49,Alert=Database Host Status,Desc=A database hos
*** Clearing Event Received *** SNG01AMMSOL04_age SAP SolMan Sys=SM1_SNG01AMMSOL04,MO=AGEEQM46,Alert

提取的数据库名称的预期输出

AD1ADS
SV1V1CH
SM1Tempdb
SNG01AMMSOL04

代码尝试

sentence = df['Description']
frame = pd.DataFrame({'logs': sentence})

import re
pattern = re.compile(r'[dD]atabase (\w+)|Sys=(\w+)')

for _, line in frame.iterrows():
name = pattern.findall(line['logs'])
if name:
line['names'] = name[0]
else:
line['names'] = 'Miscellaneous'

谁能告诉我,我在这里做错了什么。

我现在得到的输出

(u'AD1ADS', u'')
(u'SV1V1CH', u'')
(u'', u'CM1_CHE01AMMSOL04')
Miscellaneous

最佳答案

您可以使用 str.extractfillna :

p = r'[dD]atabase (\w+)|Sys=(\w+)'
s = df['logs'].str.extract(p, expand=True)
print (s)
0 1
0 AD1ADS NaN
1 SV1V1CH NaN
2 NaN SM1Tempdb
3 NaN SM1_SNG01AMMSOL04

df['db'] = s[0].fillna(s[1]).fillna('Miscellaneous')
#alternatively
#df['db'] = s[0].combine_first(s[1]).fillna('Miscellaneous')
print (df)
logs db
0 Summary: AD1: Low free DATA space in database ... AD1ADS
1 Summary: Database SV1V1CH has used log space: ... SV1V1CH
2 Summary: SAP SolMan Sys=SM1Tempdb,MO=AGEEPM49,... SM1Tempdb
3 *** Clearing Event Received *** SNG01AMMSOL04_... SM1_SNG01AMMSOL04

如果想提取所有可能的值,请使用 extractall然后在必要时加入他们:

p = r'[dD]atabase (\w+)|Sys=(\w+)'
s = df['logs'].str.extractall(p)
print (s)
0 1
match
0 0 AD1ADS NaN
1 0 SV1V1CH NaN
2 0 NaN SM1Tempdb
1 Host NaN
2 hos NaN
3 0 NaN SM1_SNG01AMMSOL04

df['db'] = s[0].fillna(s[1]).groupby(level=0).apply(', '.join)
df['db'] = df['db'].fillna('Miscellaneous')
print (df)
logs db
0 Summary: AD1: Low free DATA space in database ... AD1ADS
1 Summary: Database SV1V1CH has used log space: ... SV1V1CH
2 Summary: SAP SolMan Sys=SM1Tempdb,MO=AGEEPM49,... SM1Tempdb, Host, hos
3 *** Clearing Event Received *** SNG01AMMSOL04_... SM1_SNG01AMMSOL04

关于python - 在 Pandas 中使用正则表达式的多重模式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47011170/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com