- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我用 python 读取 PDF,并想从中提取特定段落。为此,我使用 python 并尝试通过正则表达式进行选择。为了说明这种情况,这里有一个例子。
INTERNATIONAL MONETARY FUND 7\n\x0cBELGIUM\n\n\n\nPOLICY DISCUSSIONS—MAINTAINING THE REFORM\nMOMENTUM\n7. The current recovery is an opportunity to strengthen the resilience and growth\npotential of the Belgian economy. The government's ability to deal with future shocks will depend\non whether it implements the right policies now while the economy continues to recover.\n\n\uf0b7 First, with public debt above 100 percent of GDP and only starting to come down, Belgium still\n has a long way to go to rebuild buffers and achieve a more sustainable fiscal position. This will\n require following through on plans to gradually move toward structural balance.\n\n\uf0b7 Second, with real GDP growth projected at only around 1½ percent for the foreseeable future,\n further labor and product market reforms are needed to increase productivity growth, raise\n potential output, and integrate vulnerable groups into the labor market.\n\n\uf0b7 Third, although the financial sector has recovered since the crisis and is generally sound, cyclical\n vulnerabilities are rising and new challenges are emerging, suggesting the need for vigilance\n and proactive policies.3\n\n8. The government agreed last summer on a new package of measures related to\ntaxation, the labor market, and social benefits (Table 2 and Box 1). The most notable reform was\na reduction in Belgium's corporate income tax (CIT) rate from 34 percent to 25 percent, to be\nphased in over the next three years (SMEs will benefit from a reduced rate of 20 percent starting in\n2018). To compensate for the resulting revenue loss, the notional interest rate deduction (NID) was\nmodified to apply only to incremental corporate equity rather than to the total stock, and new anti-\ntax avoidance measures were introduced consistent with Belgium's EU obligations.4 Together, the\nmeasures are designed to enhance Belgium's competitiveness while preserving revenue neutrality.\n\n9. Policy discussions focused on the importance of maintaining the reform momentum\nand not yielding to complacency. Achieving the balanced budget goal will require efforts at all\nlevels of government to make spending more efficient and safeguard revenues (Section A).\nA combination of policies and reforms could help raise productivity growth, including increasing\ninvestment in infrastructure and enhancing competition in services (Section B). To fully realize\nBelgium's employment potential, it will be critical to address the severe fragmentation of the labor\nmarket (Section C). To preserve financial stability, the authorities should address vulnerabilities in the\nmortgage market and carefully navigate the transition toward a European Banking Union (Section D).\n\n\n\n\n3\n A comprehensive assessment of Belgium's financial sector took place in 2017 under the Financial Sector\nAssessment Program (FSAP).\n4\n The NID aims to neutralize the CIT treatment of debt and equity by supplementing the deductibility of interest with\na deduction that is the product of corporate equity and a notional interest rate.\n\n\n8
每个段落都以一个数字(一位或两位数字)开头,后跟一个点和三到七个空格。末尾由下一个双换行符 \n\n
组成,后跟一个数字(一位或两位数字),最后跟一个点。请注意,这也应该作为下一个起点。在上面的例子中,我应该找到三个段落:
第一段:
- The current recovery is an opportunity to strengthen the resilience and growth\npotential of the Belgian economy. The government's ability to deal with future shocks will depend\non whether it implements the right policies now while the economy continues to recover.\n\n\uf0b7 First, with public debt above 100 percent of GDP and only starting to come down, Belgium still\n has a long way to go to rebuild buffers and achieve a more sustainable fiscal position. This will\n require following through on plans to gradually move toward structural balance.\n\n\uf0b7 Second, with real GDP growth projected at only around 1½ percent for the foreseeable future,\n further labor and product market reforms are needed to increase productivity growth, raise\n potential output, and integrate vulnerable groups into the labor market.\n\n\uf0b7 Third, although the financial sector has recovered since the crisis and is generally sound, cyclical\n vulnerabilities are rising and new challenges are emerging, suggesting the need for vigilance\n and proactive policies.3\n\n
第二段:
- The government agreed last summer on a new package of measures related to\ntaxation, the labor market, and social benefits (Table 2 and Box 1). The most notable reform was\na reduction in Belgium's corporate income tax (CIT) rate from 34 percent to 25 percent, to be\nphased in over the next three years (SMEs will benefit from a reduced rate of 20 percent starting in\n2018). To compensate for the resulting revenue loss, the notional interest rate deduction (NID) was\nmodified to apply only to incremental corporate equity rather than to the total stock, and new anti-\ntax avoidance measures were introduced consistent with Belgium's EU obligations.4 Together, the\nmeasures are designed to enhance Belgium's competitiveness while preserving revenue neutrality.\n\n
最后是第三个:
- Policy discussions focused on the importance of maintaining the reform momentum\nand not yielding to complacency. Achieving the balanced budget goal will require efforts at all\nlevels of government to make spending more efficient and safeguard revenues (Section A).\nA combination of policies and reforms could help raise productivity growth, including increasing\ninvestment in infrastructure and enhancing competition in services (Section B). To fully realize\nBelgium's employment potential, it will be critical to address the severe fragmentation of the labor\nmarket (Section C). To preserve financial stability, the authorities should address vulnerabilities in the\nmortgage market and carefully navigate the transition toward a European Banking Union (Section D).\n\n\n\n\n3\n A comprehensive assessment of Belgium's financial sector took place in 2017 under the Financial Sector\nAssessment Program (FSAP).\n4\n The NID aims to neutralize the CIT treatment of debt and equity by supplementing the deductibility of interest with\na deduction that is the product of corporate equity and a notional interest rate.\n\n
我尝试使用以下正则表达式:r'(?m)[0-99].*[.] {3,7} (.*?)\n\n
从头到尾都有选择一切的推理
(?m)[0-99].*[.] {3,7}
:分别标识每一行的开头。\n\n
指定结束。 但是,它没有找到任何东西。
最佳答案
[0-99]
模式是错误的,因为它匹配 0
中的任何 1 位数字至9
。请参阅Why doesn't [01-12] range work as expected? 。 re.M
( (?m)
) 修改 ^
和$
anchor ,但模式中两者都没有。
您可以使用
r'(?sm)^\d\d?\. {3,7}(.*?)(?=\n\n\d\d?\. |\Z)'
请参阅regex demo .
详细信息
(?sm)
-re.DOTALL
和re.MULTILINE
已启用选项^
- 一行的开头\d\d?
- 1 或 2 位数字( 0
到 99
)\.
- 一个点<code> {3,7}</code> - 3 to 7 spaces (replace with
[^\S\r\n]{3,7}` 匹配任何水平空白)(.*?)
- 第 1 组:任意 0 个以上字符,尽可能少(?=\n\n\d\d?\. |\Z)
- 一个位置,紧跟两个换行符 ( \n\n
),然后是 1 或 2 位数字 ( \d\d?
) 和一个点,后跟空格或 ( |
) 整个字符串的末尾 ( \Z
)。
import re
s="INTERNATIONAL MONETARY FUND 7\n\x0cBELGIUM\n\n\n\nPOLICY DISCUSSIONS—MAINTAINING THE REFORM\nMOMENTUM\n7. The current recovery is an opportunity to strengthen the resilience and growth\npotential of the Belgian economy. The government's ability to deal with future shocks will depend\non whether it implements the right policies now while the economy continues to recover.\n\n\uf0b7 First, with public debt above 100 percent of GDP and only starting to come down, Belgium still\n has a long way to go to rebuild buffers and achieve a more sustainable fiscal position. This will\n require following through on plans to gradually move toward structural balance.\n\n\uf0b7 Second, with real GDP growth projected at only around 1½ percent for the foreseeable future,\n further labor and product market reforms are needed to increase productivity growth, raise\n potential output, and integrate vulnerable groups into the labor market.\n\n\uf0b7 Third, although the financial sector has recovered since the crisis and is generally sound, cyclical\n vulnerabilities are rising and new challenges are emerging, suggesting the need for vigilance\n and proactive policies.3\n\n8. The government agreed last summer on a new package of measures related to\ntaxation, the labor market, and social benefits (Table 2 and Box 1). The most notable reform was\na reduction in Belgium's corporate income tax (CIT) rate from 34 percent to 25 percent, to be\nphased in over the next three years (SMEs will benefit from a reduced rate of 20 percent starting in\n2018). To compensate for the resulting revenue loss, the notional interest rate deduction (NID) was\nmodified to apply only to incremental corporate equity rather than to the total stock, and new anti-\ntax avoidance measures were introduced consistent with Belgium's EU obligations.4 Together, the\nmeasures are designed to enhance Belgium's competitiveness while preserving revenue neutrality.\n\n9. Policy discussions focused on the importance of maintaining the reform momentum\nand not yielding to complacency. Achieving the balanced budget goal will require efforts at all\nlevels of government to make spending more efficient and safeguard revenues (Section A).\nA combination of policies and reforms could help raise productivity growth, including increasing\ninvestment in infrastructure and enhancing competition in services (Section B). To fully realize\nBelgium's employment potential, it will be critical to address the severe fragmentation of the labor\nmarket (Section C). To preserve financial stability, the authorities should address vulnerabilities in the\nmortgage market and carefully navigate the transition toward a European Banking Union (Section D).\n\n\n\n\n3\n A comprehensive assessment of Belgium's financial sector took place in 2017 under the Financial Sector\nAssessment Program (FSAP).\n4\n The NID aims to neutralize the CIT treatment of debt and equity by supplementing the deductibility of interest with\na deduction that is the product of corporate equity and a notional interest rate.\n\n\n8"
for r in re.findall(r'(?sm)^\d\d?\. {3,7}(.*?)(?=\n\n\d\d?\. |\Z)', s):
print(r, "\n---------")
输出:
The current recovery is an opportunity to strengthen the resilience and growth
potential of the Belgian economy. The government's ability to deal with future shocks will depend
on whether it implements the right policies now while the economy continues to recover.
First, with public debt above 100 percent of GDP and only starting to come down, Belgium still
has a long way to go to rebuild buffers and achieve a more sustainable fiscal position. This will
require following through on plans to gradually move toward structural balance.
Second, with real GDP growth projected at only around 1½ percent for the foreseeable future,
further labor and product market reforms are needed to increase productivity growth, raise
potential output, and integrate vulnerable groups into the labor market.
Third, although the financial sector has recovered since the crisis and is generally sound, cyclical
vulnerabilities are rising and new challenges are emerging, suggesting the need for vigilance
and proactive policies.3
---------
The government agreed last summer on a new package of measures related to
taxation, the labor market, and social benefits (Table 2 and Box 1). The most notable reform was
a reduction in Belgium's corporate income tax (CIT) rate from 34 percent to 25 percent, to be
phased in over the next three years (SMEs will benefit from a reduced rate of 20 percent starting in
2018). To compensate for the resulting revenue loss, the notional interest rate deduction (NID) was
modified to apply only to incremental corporate equity rather than to the total stock, and new anti-
tax avoidance measures were introduced consistent with Belgium's EU obligations.4 Together, the
measures are designed to enhance Belgium's competitiveness while preserving revenue neutrality.
---------
Policy discussions focused on the importance of maintaining the reform momentum
and not yielding to complacency. Achieving the balanced budget goal will require efforts at all
levels of government to make spending more efficient and safeguard revenues (Section A).
A combination of policies and reforms could help raise productivity growth, including increasing
investment in infrastructure and enhancing competition in services (Section B). To fully realize
Belgium's employment potential, it will be critical to address the severe fragmentation of the labor
market (Section C). To preserve financial stability, the authorities should address vulnerabilities in the
mortgage market and carefully navigate the transition toward a European Banking Union (Section D).
3
A comprehensive assessment of Belgium's financial sector took place in 2017 under the Financial Sector
Assessment Program (FSAP).
4
The NID aims to neutralize the CIT treatment of debt and equity by supplementing the deductibility of interest with
a deduction that is the product of corporate equity and a notional interest rate.
8
---------
关于python - 使用python通过正则表达式选择文本中的每个段落,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53394296/
我的网页上显示了一份简历。其中包含部分(段落),例如教育、经验、项目等,这里是客户 想要通过在网页的段落(节)上拖动鼠标来移动页面上的这些节。我怎样才能实现这个功能。我正在使用 ruby on R
我有一个特定大小的 div,它是图像和两个段落。 都设置了向左浮动 div { width: 400px; height: 400px; } img { float: left; wi
我想完美对齐一段,使整个段落位于页面中央,但左右两边完美对齐。这是一个完美对齐的段落的图片示例: 该段落看起来像是在某种盒子中,左右两边完全笔直。我如何在 css 或 html 中执行此操作? 最佳答
我的 div 中有多个带有段落的项目,我想将它们 chop 为 2 行。我尝试使用高度进行 chop ,但结果会导致单词被 chop 。我无法使用字符,因为在某些情况下单词很长并且会被推到新行。 我正
有没有办法通过 .Net 框架(或有人写过类似的东西)在传递字符串和字典对象时获取匹配数组? 首先是一些背景 我需要 我有运动队的 csv 文件,我将其加载到字典对象中,例如... Team, Var
我需要创建一个程序来计算文本文件中字符的频率以及段落、单词和句子的数量。 我有一个问题,当我的程序输出字母的频率时,程序会为字母表中的每个字母输出多个输出。 输出应该是这样的: 如果输入是“hello
我的 Swing 应用程序中有一个 JTextPane,其上方有一个 JSlider。当我拖动 slider 时,我希望当前具有插入符号的 JTextPane 段落减少/增加其宽度(并相应地调整高度)
有没有办法通过 .Net 框架(或有人写过类似的东西)在传递字符串和字典对象时获取匹配数组? 首先是一些背景 我需要 我有运动队的 csv 文件,我将其加载到字典对象中,例如... Team, Var
假设我有一个文本句子: $body = 'the quick brown fox jumps over the lazy dog'; 我想将该句子放入“关键字”的散列中,但我想允许多单词关键字;我有以
我尝试编写一个服务器-客户端程序。我可以发送协议(protocol)文本并正确获取文本。但是当我尝试解析文本时,我遇到了 Matcher 类的问题。因为它只匹配第一行。那么我怎样才能找到正确的字符串并
由于 WordPress 在所有内容上都添加了段落标签,因此我需要在某些条件下删除段落标签。在这种情况下,我希望它们从图像中消失。我让那部分工作了: $(".scroller img").un
我需要匹配包含三个大括号之间的文本的完整 HTML 段落。 这是我输入的 HTML: {{{Lorem ipsum dolor sit amet. Ut enim ad minim veniam. D
我正在尝试查找大段落(超过一定数量的字符)并将其包装到一个范围内。目前我正在这样做: output.replace(/(\n{2}|^)([^\n{2}]{500,})(\n{2}|$)/mg, '$
所以我有这个模式,它应该提供不同的描述性段落,具体取决于用户从下拉列表中做出的选择。目前它只始终显示所有段落。我希望它在选择“公共(public)”时显示“隐藏”,在选择“内部”时显示“隐藏2”。等等
段落?
JSFiddle Link 我正在使用的 JSFiddle 似乎正是我的元素所需要的。但是,我将如何更改此当前代码以确保每个分段的段落包含相同数量的字符并且所有段落的宽度相同? 任何帮助将不胜感激,尤
我希望我所有的 p 标签继承正文的字体大小——如果我没有在它们上声明字体大小或将它们嵌套在带有字体的父项中,它们会自动执行——尺寸声明。 但是我应该在 CSS 中的 p 中添加 font-size:
警告框作为回显?
Achtung! This alert box indicates a dangerous or potentially negative action.× 所以我创建了自己的警告框,但问
有什么方法可以使用 python-docx 访问和操作文本框中现有 docx 文档中的文本? 我试图通过迭代在文档的所有段落中找到关键字: doc = Document('test.docx') fo
这是在亚马逊电话采访中被问到的——“你能写一个程序(用你喜欢的语言 C/C++/等)在一个大的字符串缓冲区中找到一个给定的词吗?即数字出现次数“ 我仍在寻找我应该给面试官的完美答案。我试着写一个线性搜
当我使用这段代码时,我可以用文本制作图像,但在一行中, function writetext($image_path,$imgdestpath,$x,$y,$angle,$text,$font,$fo
我是一名优秀的程序员,十分优秀!