gpt4 book ai didi

python - 缩写被多次替换 : iaw --> in accordance with --> input accordance with

转载 作者:行者123 更新时间:2023-12-01 08:09:37 30 4
gpt4 key购买 nike

我正在处理一个在文本列中包含大量缩写的数据框。使用预定义的词典,我将缩写词替换为完整的单词,这样就可以了。

但是缩写似乎被替换了不止一次。如果替换缩写词的完整单词包含另一个缩写词,则再次替换该缩写词:

d = {' h ' : ' height ', ' mm ' : ' milimeter ', ' w ' : 'width', ' iaw ' : ' in accordance with ', ' in ' : ' input '}

dt = {"Number":[1, 2], "text": ["measure depth 22 mm h 24 mm w 75 mm", "wheel 4 iaw amm"]}

dataframe = pd.DataFrame(dt)

def process_data(file_name):
data = file_name
data["text"].replace(d, regex=True, inplace=True)
return data

df = process_data(dataframe)
print(df)

结果是:

   Number                                                 text
0 1 measure depth 22 milimeter height 24 milimeter w 75 mm
1 2 wheel 4 input accordance with amm

虽然应该是:

   Number                                                 text
0 1 measure depth 22 milimeter height 24 milimeter w 75 mm
1 2 wheel 4 in accordance with amm

有人知道如何解决这个问题吗?

最佳答案

您可以使用函数Series.str.replace正则表达式:

#removed whitespaces
d = {'h' : 'height',
'mm' : 'milimeter',
'w' : 'width',
'iaw' : 'in accordance with',
'in' : 'input'}


pat = '|'.join(r"\b{}\b".format(x) for x in d.keys())
dataframe['keyword'] = dataframe['text'].str.replace(pat, lambda x: d[x.group()], regex=True)
print (dataframe)

Number text \
0 1 measure depth 22 mm h 24 mm w 75 mm
1 2 wheel 4 iaw amm

keyword
0 measure depth 22 milimeter height 24 milimeter...
1 wheel 4 in accordance with amm

另一个解决方案是通过空格分割值,通过字典映射 get 并通过 space 返回 join:

f = lambda x: ' '.join(d.get(y, y) for y in x.split())
dataframe['keyword'] = dataframe['text'].apply(f)
print (dataframe)
Number text \
0 1 measure depth 22 mm h 24 mm w 75 mm
1 2 wheel 4 iaw amm

keyword
0 measure depth 22 milimeter height 24 milimeter...
1 wheel 4 in accordance with amm

关于python - 缩写被多次替换 : iaw --> in accordance with --> input accordance with,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55339439/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com