gpt4 book ai didi

python - 如何使用 Python 在数据框中查找字符串匹配项

转载 作者:太空宇宙 更新时间:2023-11-03 20:54:17 24 4
gpt4 key购买 nike

我试图找到一串文本和我的数据框的两列(“股票”和/或“公司”)之间的紧密匹配。

这是数据框的示例:

cik     | tickers | company                      |
--------------------------------------------------
1090872 | A | Agilent Technologies Inc |
--------------------------------------------------
4281 | AA | Alcoa Inc |
--------------------------------------------------
6201 | AAL | American Airlines Group Inc|
--------------------------------------------------
8177 | AAME | Atlantic American Corp |
--------------------------------------------------
706688 | AAN | Aarons Inc |
--------------------------------------------------
320193 | AAPL | Apple Inc |
--------------------------------------------------

这就是某些文本的外观:

text = 'consectetur elementum Apple Inc Agilent Inc. Aenean porttitor porta magna AA American Airlines AAMC Aarons Inc AAPL e plumbs ernum. AA'

我想找到本文中所有接近的匹配项,并使输出类似于:

The following companies were found in 'text':
- AAPL: Apple Inc
- A: Agilent Technologies Inc
- AA: American Airlines Group Inc
- AAN: Aarons Inc

这是我到目前为止的代码,但它不完整,我认识到它需要不同的方法:

import pandas as pd
import re

data = {'cik': ['1090872', '4281', '6201', '8177', '706688', '320193'], 'ticker': ['A', 'AA', 'AAL', 'AAME', 'AAN', 'AAPL'], 'company': ['Agilent Technologies Inc', 'Alcoa Inc', 'American Airlines Group Inc', 'Atlantic American Corp', 'Aarons Inc', 'Apple Inc']}
df = pd.DataFrame(data, columns=['cik', 'ticker', 'company'])

text = 'consectetur elementum Apple Inc Agilent Inc. Aenean porttitor porta magna AA American Airlines AAMC Aarons Inc AAPL e plumbs ernum. AA'

ticker = df['ticker']
regex = re.compile(r"\b(?:" + "|".join(map(re.escape, ticker)) + r")\b")

matches = re.findall(regex, text)
for match in matches:
print(match)

最佳答案

这是我解决这个问题的方法。首先根据您的代码进行设置

import pandas as pd
import numpy as np
data = [['1090872', 'A', 'Agilent Technologies Inc'], ['4281', 'AA', 'Alcoa Inc'],
['6201', 'AAL', 'American Airlines Group Inc'], ['8177', 'AAME', 'Atlantic American Corp'],
['706688', 'AAN', 'Aarons Inc'], ['320193', 'AAPL', 'Apple Inc']]
df = pd.DataFrame(data, columns=['cik', 'tickers', 'company'])
text = "consectetur elementum Apple Inc Agilent Inc. Aenean porttitor porta magna AA American \
Airlines AAMC Aarons Inc AAPL e plumbs ernum. AA"
df['text'] = text
df['found'] = None

company_values = df['company'].values
for val in company_values:
row = df.loc[df['company'] == val]
if row['text'].str.contains(val).any():
df.loc[df['company'] == val, 'found'] = 'Yes'
# filter the results
df.loc[df['found'] == 'Yes']

我认为将文本作为数据框的一部分,然后搜索实际找到的公司,然后将其记录在 df['found'] 列中,然后您可以过滤该列以查找公司名单。在这里,我假设数据框仅包含唯一的公司名称及其股票代码。

关于python - 如何使用 Python 在数据框中查找字符串匹配项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56115390/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com