- android - 多次调用 OnPrimaryClipChangedListener
- android - 无法更新 RecyclerView 中的 TextView 字段
- android.database.CursorIndexOutOfBoundsException : Index 0 requested, 光标大小为 0
- android - 使用 AppCompat 时,我们是否需要明确指定其 UI 组件(Spinner、EditText)颜色
我正在构建一个 scrapy 应用程序,如果 URL 中的子字符串匹配,我需要提取完整的 URL。
例如:
假设某个页面具有我感兴趣的以下 URL:
/public/flag?cat=Computers/Programming/Languages/Python/Books&url=http://www.pearsonhighered.com/educator/academic/product/0,,0130260363,00%2Ben-USS_01DBC.html
/public/flag?cat=Computers/Programming/Languages/Python/Books&url=http://www.brpreiss.com/books/opus7/html/book.html
/public/flag?cat=Computers/Programming/Languages/Python/Books&url=http://www.diveintopython.net/
/public/flag?cat=Computers/Programming/Languages/Python/Books&url=http://rhodesmill.org/brandon/2011/foundations-of-python-network-programming/
但我的搜索字符串是 flag?cat=Computers/Programming/Languages/Python/Books
仅返回 URL 的匹配部分,而不返回完整的 URL。如何获取上面列出的完整 URL?
这是一个基于示例的简单 scrapy 测试用例:
from scrapy.spiders import Spider
from scrapy.selector import Selector
import scrapy
class DmozSpider(Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
]
def parse(self, response):
#scrapy.shell.inspect_response( response, self )
results = response.xpath('//body').re('(flag\?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks)')
print results
输出:
[
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks',
u'flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks'
]
预期输出:
[
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.pearsonhighered.com%2Feducator%2Facademic%2Fproduct%2F0%2C%2C0130260363%2C00%252Ben-USS_01DBC.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.brpreiss.com%2Fbooks%2Fopus7%2Fhtml%2Fbook.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.diveintopython.net%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Frhodesmill.org%2Fbrandon%2F2011%2Ffoundations-of-python-network-programming%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.techbooksforfree.com%2Fperlpython.shtml"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.freetechbooks.com%2Fpython-f6.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fgreenteapress.com%2Fthinkpython%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.network-theory.co.uk%2Fpython%2Fintro%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.freenetpages.co.uk%2Fhp%2Falan.gauld%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.wiley.com%2FWileyCDA%2FWileyTitle%2FproductCd-0471219754.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fhetland.org%2Fwriting%2Fpractical-python%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fsysadminpy.com%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.qtrac.eu%2Fpy3book.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.wiley.com%2FWileyCDA%2FWileyTitle%2FproductCd-0764548077.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=https%3A%2F%2Fwww.packtpub.com%2Fpython-3-object-oriented-programming%2Fbook"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.network-theory.co.uk%2Fpython%2Flanguage%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.pearsonhighered.com%2Feducator%2Facademic%2Fproduct%2F0%2C%2C0130409561%2C00%252Ben-USS_01DBC.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.informit.com%2Fstore%2Fproduct.aspx%3Fisbn%3D0201616165%26redir%3D1"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.pearsonhighered.com%2Feducator%2Facademic%2Fproduct%2F0%2C%2C0201748843%2C00%252Ben-USS_01DBC.html"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.informit.com%2Fstore%2Fproduct.aspx%3Fisbn%3D0672317354"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fgnosis.cx%2FTPiP%2F"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.informit.com%2Fstore%2Fproduct.aspx%3Fisbn%3D0130211192"><img src="/img/flag.png" alt="[!]" title="report an issue with this listing'
]
最佳答案
问题是 .re()
只会返回与表达式匹配的部分。相反,如果您想继续使用正则表达式检查,请使用 re:test()
钩子(Hook):
response.xpath('//body//a/@href[re:test(., "flag\?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks")]').extract()
在我这边产生以下内容:
[
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.pearsonhighered.com%2Feducator%2Facademic%2Fproduct%2F0%2C%2C0130260363%2C00%252Ben-USS_01DBC.html',
u'/public/flag?cat=Computers%2FProgramming%2FLanguages%2FPython%2FBooks&url=http%3A%2F%2Fwww.brpreiss.com%2Fbooks%2Fopus7%2Fhtml%2Fbook.html',
...
]
关于python - 提取子字符串的上下文 URL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36604507/
出现以下错误 Network access for Distributed Transaction Manager (MSDTC) has been disabled. Please enable D
在调试应用程序时出现以下错误。 The CLR has been unable to transition from COM context 0x3b2d70 to COM context 0x3b2
在 GAE Go 中,为了记录,我们需要使用 appengine.NewContext(r) 创建一个新的上下文,它返回 context.Context。 如何使用此上下文在请求范围内设置/获取变量?
我想使用 Puppeteer 从放置在页面上 iframe 内的选择器中获取数据,该页面在与其父框架域不同的域上运行。因此,我不是任何域的所有者 - 无法使用 frame.postMessage。 试
我正在尝试获取可用的应用程序上下文并想切换到 webview 上下文,但 appium 仅获取 Navive App。 应用程序还启用了 WebView。 Appium 版本:1.10.1 Chrom
这个问题在这里已经有了答案: How to fix this nullOk error when using the flutter_svg package? (7 个回答) 7 个月前关闭。 当我尝
我观看了关于 Core Data 的 2016 WWDC 视频并查看了各种教程。我见过使用 Core Data Framework 创建对象以持久保存到 managedObjectContext 中的
这是代码 obj = { a: 'some value'; m: function(){ alert(this.a); } } obj.m(); 结果是'som
我正在尝试做类似的事情 $(".className").click(function() { $(this).(".anotherClass").css("z-index","1");
var User = { Name: "Some Name", Age: 26, Show: function() { alert("Age= "+this.Age)}; }; fun
我目前正在使用我见过的常见 Context 模式,它允许子组件通过传递修饰函数来更新父组件的状态(即 Provider)通过共享的 Context。 我遇到的问题是,修改函数只引用原始状态,不引用最新
有没有办法让 React Context类型安全与流类型? 例如: Button.contextTypes = { color: React.PropTypes.string }; 最佳答案 不幸
我想知道是否有一种方法可以为不同的功能使用不同的上下文类。 我希望有一个功能使用 MinkExtensions 进行浏览器测试,另一个功能使用和 HTTP 客户端(如 Guzzle)进行 API 测试
我有这个配置文件 apiVersion: v1 clusters: - cluster: server: [REDACTED] // IP of my cluster name: stag
我在实现非抢先式调度时遇到了用于初始化TCB的代码。 typedef struct TCB_t { struct TCB_t *next; struct TCB_t
我想将一个函数设置为数组中每个元素的属性,但使用不同的参数调用它。我想我会使用匿名函数来解决它: for ( var i = 0; i < object_count; i++ ) { obje
这个问题已经有答案了: How to access the correct `this` inside a callback (15 个回答) 已关闭 7 年前。 我正在做一些练习,但我在管道方法中丢
我正在尝试通过 Java 和 Android Studio 学习和制作 Android 应用程序。我对Java的了解程度是两年前几个小时的youtube学习和大学基础类(class)。不过我确实知道如
我在(这个)上遇到了问题。错误ImageView无法应用。我在 fragment 类中执行此代码。 ViewFlipper v_flipper; @Nullable @Override public
我想使用 openGL 的某些功能,但与渲染视觉内容无关。有没有办法在没有任何依赖性的情况下创建它(不是对 Windows,也不是某些包[SDL,SFML,GLUT])?只允许使用没有外部库的库,就像
我是一名优秀的程序员,十分优秀!