gpt4 book ai didi

python - Twisted 简单 HTTP 代理(续)

转载 作者:太空宇宙 更新时间:2023-11-03 19:18:02 26 4
gpt4 key购买 nike

我在这个网站上找到了这个脚本:

from twisted.web import proxy, http
from twisted.internet import reactor
import sys

class MyProxy(proxy.Proxy):

def dataReceived(self, data):
print data
return proxy.Proxy.dataReceived(self, data)

class ProxyFactory(http.HTTPFactory):
protocol=MyProxy

factory = ProxyFactory()
reactor.listenTCP(8080, factory)
reactor.run()

正如您所见,我重写了 dataReceived 方法来打印数据。运行时,将每个请求的 header 打印到标准输出:

GET http://careers.stackoverflow.com/ad/i/nNxudq0-kvjnJ84-n6osrC0-12-vYY HTTP/1.1
Host: careers.stackoverflow.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:12.0) Gecko/20100101 Firefox/12.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Proxy-Connection: keep-alive
Referer: http://stackoverflow.com/questions/7052849/simple-http-proxy
Cookie: __utma=140029553.285085787.1331510700.1337692646.1337711538.33; __utmz=140029553.1337711538.33.19.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); __qca=P0-608923218-1331510699748; usr=t=5TLQ0kWmkGJo&s=RgkodeSUGq8k; __utmc=140029553; __utmb=140029553.3.10.1337711538
  1. 是否可以以这样的方式(或任何其他实现)覆盖它,以便我可以将接收到的数据( header )作为字典访问,(例如:data['Host'] = 'xxxx' ...)
  2. 我还想获取该页面的所有网址。

最佳答案

既然您正在获取原始数据,请测试每一行是否是标题(/^[-a-zA-Z]+:/听起来是一个好的开始;还要注意表示结尾的双 crlf标题),然后自己将其存储到字典中。

关于python - Twisted 简单 HTTP 代理(续),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10708148/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com