gpt4 book ai didi

python 正则表达式从文本中省略复杂的引用样式

转载 作者:行者123 更新时间:2023-12-01 08:28:03 25 4
gpt4 key购买 nike

我已将文件的内容读入 python,并且我想删除遵循相同通用格式的所有引用:

(Author et al., .............. \nGoogle Scholar) # there could be many '\nGoogle Scholar's within the brackets

Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses (Gutierrez etal., 2017Gutierrez G.D. Gromada J. Sussel L. Heterogeneity of the pancreatic beta cell.Front. Genet. 2017; 8: 22Crossref\nPubMed\nScopus (11)\nGoogle Scholar, Roscioni etal., 2016Roscioni S.S. Migliorini A. Gegg M. Lickert H. Impact of islet architecture on -cell heterogeneity, plasticity and function.Nat. Rev. Endocrinol. 2016; 12: 695-709Crossref\nPubMed\nScopus (36)\nGoogle Scholar). Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes (Cui etal., 2018Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J. Ansong C. Sussel L. Orr G. Fluctuation localization imaging-based fluorescence insitu hybridization (fliFISH) for accurate detection and counting of RNA copies in single cells.Nucleic Acids Res. 2018; 46: e7Crossref\nPubMed\nScopus (2)\nGoogle Scholar). We have optimized the standard tissue smFISH protocol (Lyubimova etal., 2013Lyubimova A. Itzkovitz S. Junker J.P. Fan Z.P. Wu X. van Oudenaarden A. Single-molecule mRNA detection and counting in mammalian tissue.Nat. Protoc. 2013; 8: 1743-1758Crossref\nPubMed\nScopus (62)\nGoogle Scholar) by substantially increasing the period of mRNA denaturation, which precedes the probe hybridization steps, from 5min to at least 3hr.

期望的输出

Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses . Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes . We have optimized the standard tissue smFISH protocol by substantially increasing the period of mRNA denaturation, which precedes the probe hybridization steps, from 5min to at least 3hr.

我找不到一个正则表达式可以一次性省略所有引用,因此我不得不分两部分进行:

  1. 查找每个“\nGoogle Scholar)”出现的所有位置。
  2. 从每个位置向后延伸,直到出现相应的左括号,然后省略这些索引之间的字符。

我尝试这样做:

def remove(test_str):
regex=re.compile('\\nGoogle Scholar\)')
starts=[]
ends=[]
ret=''
for end in regex.finditer(test_str): #find all 'Google Scholar)'
ends.append(m.end())
for e in ends: #find all starting brackets
i=e
while True:
if bool(re.match('\(\D+',test_str[i-2:i])):
starts.append(i-2)
break
else:
i-=1
start=test_str[:starts[0]] #omit all characters in between
starts=starts[1:]
end=test_str[ends[-1]:]
ends=ends[:-1]
for i,j in zip(starts,ends):
ret=ret+test_str[j:i]
return start+ret+end

然而,这个策略失败了,因为我用来查找每个起始括号的正则表达式 (\(\D+) 不够精确 - 引用中经常有闭括号,例如

(Cui etal., 2018Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J. Ansong C. Sussel L. Orr G. Fluctuation localization imaging-based fluorescence insitu hybridization (fliFISH) for accurate detection and counting of RNA copies in single cells.Nucleic Acids Res. 2018; 46: e7Crossref\nPubMed\nScopus (2)\nGoogle Scholar)

因此,在这种情况下,对正确左括号的搜索会过早停止......

有人可以推荐一种持续删除所有引用的好方法吗?

最佳答案

根据您描述的模式,您可以使用此正则表达式,

(?s)\(.*?Google Scholar\) ?

并将其替换为空字符串。这里 (?s) 用于启用 . 匹配新行。

<强> Check here

这是一个Python代码演示,

import re

s = 'Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses (Gutierrez etal., 2017Gutierrez G.D. Gromada J. Sussel L. Heterogeneity of the pancreatic beta cell.Front. Genet. 2017; 8: 22Crossref\nPubMed\nScopus (11)\nGoogle Scholar, Roscioni etal., 2016Roscioni S.S. Migliorini A. Gegg M. Lickert H. Impact of islet architecture on -cell heterogeneity, plasticity and function.Nat. Rev. Endocrinol. 2016; 12: 695-709Crossref\nPubMed\nScopus (36)\nGoogle Scholar). Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes (Cui etal., 2018Cui Y. Hu D. Markillie L.M. Chrisler W.B. Gaffrey M.J. Ansong C. Sussel L. Orr G. Fluctuation localization imaging-based fluorescence insitu hybridization (fliFISH) for accurate detection and counting of RNA copies in single cells.Nucleic Acids Res. 2018; 46: e7Crossref\nPubMed\nScopus (2)\nGoogle Scholar). We have optimized the standard tissue smFISH protocol (Lyubimova etal., 2013Lyubimova A. Itzkovitz S. Junker J.P. Fan Z.P. Wu X. van Oudenaarden A. Single-molecule mRNA detection and counting in mammalian tissue.Nat. Protoc. 2013; 8: 1743-1758Crossref\nPubMed\nScopus (62)\nGoogle Scholar) by substantially increasing the period of mRNA denaturation, which precedes the probe hybridization steps, from 5min to at least 3hr.'

replacedStr = re.sub(r'(?s)\(.*?Google Scholar\) ?','',s)
print(replacedStr)

打印以下内容,就像您在帖子中提到的那样。

Introduction The endocrine cells in the pancreatic islets of Langerhans secrete insulin and glucagon in response to glucose perturbations to maintain glucose homeostasis. The insulin-secreting beta cells exhibit morphological, functional, and molecular variations, suggesting that they may consist of sub-populations with specialized tasks and physiological responses . Features of beta cell heterogeneity include glucose responsiveness and secretory activity ..... Visualizing transcripts in the pancreas, however, has been infeasible without the use of specialized techniques such as photoswitchable dyes . We have optimized the standard tissue smFISH protocol by substantially increasing the period of mRNA denaturation, which precedes the probehybridization steps, from 5min to at least 3hr.

关于python 正则表达式从文本中省略复杂的引用样式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54095734/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com