作者热门文章
- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我的数据样本是:
comment sarc_majority
0 [?, ?] sarc
1 [0] non-sarc
2 [!, !, !] sarc
3 [0] non-sarc
4 [?] sarc
我想用新名称替换标点符号。例如 ? = 点1,! = punct2,' = punct3。我尝试使用从 csv 文件读取。
replace_df = pd.read_csv('./final/eng-mly-punct.csv', sep=',', quoting=csv.QUOTE_NONE,
names=["punct", "replacer"])
replace_df.head()
punct replacer
0 ? punct1
1 ! punct2
2 ' punct3
然后我坚持替换:
for punct, replacer in replace_df.itertuples(index=False,name=None):
df.comment = df.comment.str.replace(r'\b{0}\b'.format(punct),replacer)
错误是:错误:没有可重复的内容
出了什么问题?或者有什么可能的方法来做到这一点?所需的输出应该类似于:
comment sarc_majority
0 [punct1, punct1] sarc
1 [0] non-sarc
2 [punct2, punct2, punct2] sarc
3 [0] non-sarc
4 [punct1] sarc
提前致谢。干杯。
最佳答案
您可以使用replace
通过 dict d
- 但需要将 ?
转义为 \?
:
d = {'\?':'punct1','!':'punct2',"'":'punct3'}
df.comment = df.comment.replace(d, regex=True)
print (df)
comment sarc_majority
0 [punct1, punct1] sarc
1 [0] non-sarc
2 [punct2, punct2, punct2] sarc
3 [0] non-sarc
4 [punct1] sarc
您还可以从 replace_df
创建 d
:
df = pd.DataFrame({'comment': {0: '[?, ?]', 1: '[0]', 2: '[!, !, !]', 3: '[0]', 4: '[?]'}, 'sarc_majority': {0: 'sarc', 1: 'non-sarc', 2: 'sarc', 3: 'non-sarc', 4: 'sarc'}})
print (df)
comment sarc_majority
0 [?, ?] sarc
1 [0] non-sarc
2 [!, !, !] sarc
3 [0] non-sarc
4 [?] sarc
replace_df = pd.DataFrame({'replacer': {0: 'punct1', 1: 'punct2', 2: 'punct3'}, 'punct': {0: '?', 1: '!', 2: "'"}})
print (replace_df)
punct replacer
0 ? punct1
1 ! punct2
2 ' punct3
replace_df.punct = '\\' + replace_df.punct
d = replace_df.set_index('punct')['replacer'].to_dict()
print (d)
{'\\!': 'punct2', "\\'": 'punct3', '\\?': 'punct1'}
df.comment = df.comment.replace(d, regex=True)
print (df)
comment sarc_majority
0 [punct1, punct1] sarc
1 [0] non-sarc
2 [punct2, punct2, punct2] sarc
3 [0] non-sarc
4 [punct1] sarc
按评论编辑:
df = pd.DataFrame({'comment':[['?', '?'],[0], ['!', '!', '!'], [0], ['?']], 'sarc_majority': [ 'sarc','non-sarc', 'sarc', 'non-sarc','sarc']})
print (df)
comment sarc_majority
0 [?, ?] sarc
1 [0] non-sarc
2 [!, !, !] sarc
3 [0] non-sarc
4 [?] sarc
print (type(df.ix[0,'comment']))
<class 'list'>
replace_df = pd.DataFrame({'replacer': {0: 'punct1', 1: 'punct2', 2: 'punct3'}, 'punct': {0: '?', 1: '!', 2: "'"}})
#print (replace_df)
replace_df.punct = '\\' + replace_df.punct.apply(lambda x: x.format())
d = replace_df.set_index('punct')['replacer'].to_dict()
print (d)
{'\\!': 'punct2', "\\'": 'punct3', '\\?': 'punct1'}
df.comment = df.comment.apply(lambda x: pd.Series(x).astype(str).replace(d, regex=True).tolist())
print (df)
comment sarc_majority
0 [punct1, punct1] sarc
1 [0] non-sarc
2 [punct2, punct2, punct2] sarc
3 [0] non-sarc
4 [punct1] sarc
关于python - 如何用新名称替换特定标点符号?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40627467/
我是一名优秀的程序员,十分优秀!