python - 为什么 utorrents Magnet 到 Torrent 文件的获取速度比我的 python 脚本更快？-6ren

python - 为什么 utorrents Magnet 到 Torrent 文件的获取速度比我的 python 脚本更快？

转载作者：行者123 更新时间：2023-11-30 23:06:13

我正在尝试使用 python 脚本转换 .torrent 文件中的 torrent 磁力网址。python 脚本连接到 dht 并等待元数据，然后从中创建 torrent 文件。

例如

#!/usr/bin/env python
'''
Created on Apr 19, 2012
@author: dan, Faless

    GNU GENERAL PUBLIC LICENSE - Version 3

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

    http://www.gnu.org/licenses/gpl-3.0.txt

'''

import shutil
import tempfile
import os.path as pt
import sys
import libtorrent as lt
from time import sleep


def magnet2torrent(magnet, output_name=None):
    if output_name and \
            not pt.isdir(output_name) and \
            not pt.isdir(pt.dirname(pt.abspath(output_name))):
        print("Invalid output folder: " + pt.dirname(pt.abspath(output_name)))
        print("")
        sys.exit(0)

    tempdir = tempfile.mkdtemp()
    ses = lt.session()
    params = {
        'save_path': tempdir,
        'duplicate_is_error': True,
        'storage_mode': lt.storage_mode_t(2),
        'paused': False,
        'auto_managed': True,
        'duplicate_is_error': True
    }
    handle = lt.add_magnet_uri(ses, magnet, params)

    print("Downloading Metadata (this may take a while)")
    while (not handle.has_metadata()):
        try:
            sleep(1)
        except KeyboardInterrupt:
            print("Aborting...")
            ses.pause()
            print("Cleanup dir " + tempdir)
            shutil.rmtree(tempdir)
            sys.exit(0)
    ses.pause()
    print("Done")

    torinfo = handle.get_torrent_info()
    torfile = lt.create_torrent(torinfo)

    output = pt.abspath(torinfo.name() + ".torrent")

    if output_name:
        if pt.isdir(output_name):
            output = pt.abspath(pt.join(
                output_name, torinfo.name() + ".torrent"))
        elif pt.isdir(pt.dirname(pt.abspath(output_name))):
            output = pt.abspath(output_name)

    print("Saving torrent file here : " + output + " ...")
    torcontent = lt.bencode(torfile.generate())
    f = open(output, "wb")
    f.write(lt.bencode(torfile.generate()))
    f.close()
    print("Saved! Cleaning up dir: " + tempdir)
    ses.remove_torrent(handle)
    shutil.rmtree(tempdir)

    return output


def showHelp():
    print("")
    print("USAGE: " + pt.basename(sys.argv[0]) + " MAGNET [OUTPUT]")
    print("  MAGNET\t- the magnet url")
    print("  OUTPUT\t- the output torrent file name")
    print("")


def main():
    if len(sys.argv) < 2:
        showHelp()
        sys.exit(0)

    magnet = sys.argv[1]
    output_name = None

    if len(sys.argv) >= 3:
        output_name = sys.argv[2]

    magnet2torrent(magnet, output_name)


if __name__ == "__main__":
    main()

上面的脚本需要大约 1 分钟以上的时间来获取元数据并创建 .torrent 文件，而 utorrent 客户端只需要几秒钟，这是为什么？

如何使我的脚本更快？

我想获取大约 1k+ 种子的元数据。

例如磁力链接

magnet:?xt=urn:btih:BFEFB51F4670D682E98382ADF81014638A25105A&dn=openSUSE+13.2+DVD+x86_64.iso&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80

<小时/>

更新:

我已经在我的脚本中指定了这样的已知 dht 路由器 URL。

session = lt.session()
session.listen_on(6881, 6891)

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

但它仍然很慢，有时我会遇到类似的错误

DHT error [hostname lookup] (1) Host not found (authoritative)
could not map port using UPnP: no router found

<小时/>

更新:

我编写了这个 scmall 脚本，它从数据库获取十六进制信息哈希并尝试从 dht 获取元数据，然后将 torrent 文件插入数据库。

我让它无限期地运行，因为我不知道如何保存状态，所以保持它运行将获得更多的对等点，并且获取元数据会更快。

#!/usr/bin/env python
# this file will run as client or daemon and fetch torrent meta data i.e. torrent files from magnet uri

import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory 
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity 
import os
from datetime import date, timedelta

#create lock file to make sure only single instance is running
lock_file_name = "/daemon.lock"

if(os.path.isfile(lock_file_name)):
    sys.exit('another instance running')
#else:
    #f = open(lock_file_name, "w")
    #f.close()

session = lt.session()
session.listen_on(6881, 6891)

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

alive = True
while alive:

    db_conn = MySQLdb.connect(  host = 'localhost',     user = '',  passwd = '',    db = 'basesite',    unix_socket='') # Open database connection
    #print('reconnecting')
    #get all records where enabled = 0 and uploaded within yesterday 
    subset_count = 5 ;

    yesterday = date.today() - timedelta(1)
    yesterday = yesterday.strftime('%Y-%m-%d %H:%M:%S')
    #print(yesterday)

    total_count_query = ("SELECT COUNT(*) as total_count FROM content WHERE upload_date > '"+ yesterday +"' AND enabled = '0' ")
    #print(total_count_query)
    try:
        total_count_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
        total_count_cursor.execute(total_count_query) # Execute the SQL command
        total_count_results = total_count_cursor.fetchone() # Fetch all the rows in a list of lists.
        total_count = total_count_results[0]
        print(total_count)
    except:
            print "Error: unable to select data"

    total_pages = total_count/subset_count
    #print(total_pages)

    current_page = 1
    while(current_page <= total_pages):
        from_count = (current_page * subset_count) - subset_count

        #print(current_page)
        #print(from_count)

        hashes = []

        get_mysql_data_query = ("SELECT hash FROM content WHERE upload_date > '" + yesterday +"' AND enabled = '0' ORDER BY record_num ASC LIMIT "+ str(from_count) +" , " + str(subset_count) +" ")
        #print(get_mysql_data_query)
        try:
            get_mysql_data_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
            get_mysql_data_cursor.execute(get_mysql_data_query) # Execute the SQL command
            get_mysql_data_results = get_mysql_data_cursor.fetchall() # Fetch all the rows in a list of lists.
            for row in get_mysql_data_results:
                hashes.append(row[0].upper())
        except:
            print "Error: unable to select data"

        print(hashes)

        handles = []

        for hash in hashes:
            tempdir = tempfile.mkdtemp()
            add_magnet_uri_params = {
                'save_path': tempdir,
                'duplicate_is_error': True,
                'storage_mode': lt.storage_mode_t(2),
                'paused': False,
                'auto_managed': True,
                'duplicate_is_error': True
            }
            magnet_uri = "magnet:?xt=urn:btih:" + hash.upper() + "&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80&tr=udp%3A%2F%2Ftracker.publicbt.com%3A80&tr=udp%3A%2F%2Ftracker.ccc.de%3A80"
            #print(magnet_uri)
            handle = lt.add_magnet_uri(session, magnet_uri, add_magnet_uri_params)
            handles.append(handle) #push handle in handles list

        #print("handles length is :")
        #print(len(handles))

        while(len(handles) != 0):
            for h in handles:
                #print("inside handles for each loop")
                if h.has_metadata():
                    torinfo = h.get_torrent_info()
                    final_info_hash = str(torinfo.info_hash())
                    final_info_hash = final_info_hash.upper()
                    torfile = lt.create_torrent(torinfo)
                    torcontent = lt.bencode(torfile.generate())
                    tfile_size = len(torcontent)
                    try:
                        insert_cursor = db_conn.cursor()# prepare a cursor object using cursor() method
                        insert_cursor.execute("""INSERT INTO dht_tfiles (hash, tdata) VALUES (%s, %s)""",  [final_info_hash , torcontent] )
                        db_conn.commit()
                        #print "data inserted in DB"
                    except MySQLdb.Error, e:
                        try:
                            print "MySQL Error [%d]: %s" % (e.args[0], e.args[1])
                        except IndexError:
                            print "MySQL Error: %s" % str(e)    

                    shutil.rmtree(h.save_path())    #   remove temp data directory
                    session.remove_torrent(h) # remove torrnt handle from session   
                    handles.remove(h) #remove handle from list

                else:
                    if(h.status().active_time > 600):   # check if handle is more than 10 minutes old i.e. 600 seconds
                        #print('remove_torrent')
                        shutil.rmtree(h.save_path())    #   remove temp data directory
                        session.remove_torrent(h) # remove torrnt handle from session   
                        handles.remove(h) #remove handle from list
                sleep(1)        
                #print('sleep1')

        print('sleep10')
        sleep(10)
        current_page = current_page + 1
    #print('sleep20')
    sleep(20)

os.remove(lock_file_name);

现在我需要按照 Arvid 的建议实现新事物。

<小时/>

更新

我已经成功实现了 Arvid 的建议。以及我在 deluge 支持论坛 http://forum.deluge-torrent.org/viewtopic.php?f=7&t=42299&start=10 中找到的更多扩展

#!/usr/bin/env python

import libtorrent as lt # libtorrent library
import tempfile # for settings parameters while fetching metadata as temp dir
import sys #getting arguiments from shell or exit script
from time import sleep #sleep
import shutil # removing directory tree from temp directory 
import os.path # for getting pwd and other things
from pprint import pprint # for debugging, showing object data
import MySQLdb # DB connectivity 
import os
from datetime import date, timedelta

def var_dump(obj):
  for attr in dir(obj):
    print "obj.%s = %s" % (attr, getattr(obj, attr))

session = lt.session()
session.add_extension('ut_pex')
session.add_extension('ut_metadata')
session.add_extension('smart_ban')
session.add_extension('metadata_transfer')  

#session = lt.session(lt.fingerprint("DE", 0, 1, 0, 0), flags=1)

session_save_filename = "/tmp/new.client.save_state"

if(os.path.isfile(session_save_filename)):

    fileread = open(session_save_filename, 'rb')
    session.load_state(lt.bdecode(fileread.read()))
    fileread.close()
    print('session loaded from file')
else:
    print('new session started')

session.add_dht_router("router.utorrent.com", 6881)
session.add_dht_router("router.bittorrent.com", 6881)
session.add_dht_router("dht.transmissionbt.com", 6881)
session.add_dht_router("router.bitcomet.com", 6881)
session.add_dht_router("dht.aelitis.com", 6881)
session.start_dht()

alerts = [] 

alive = True
while alive:
    a = session.pop_alert()
    alerts.append(a)
    print('----------')
    for a in alerts:
        var_dump(a)
        alerts.remove(a)


    print('sleep10')
    sleep(10)
    filewrite = open(session_save_filename, "wb")
    filewrite.write(lt.bencode(session.save_state()))
    filewrite.close()

让它运行一分钟并收到警报

obj.msg = no router found

<小时/>

更新:

经过一些测试看起来像

session.add_dht_router("router.bitcomet.com", 6881)

导致

('%s: %s', 'alert', 'DHT error [hostname lookup] (1) Host not found (authoritative)')

<小时/>

更新:我添加了

session.start_dht()
session.start_lsd()
session.start_upnp()
session.start_natpmp()

并收到警报

('%s: %s', 'portmap_error_alert', 'could not map port using UPnP: no router found')

最佳答案

正如 MatteoItalia 指出的那样，引导 DHT 并不是即时的，有时可能需要一段时间。引导过程完成时没有明确定义的时间点，它是越来越多地连接到网络的连续体。

您了解的连接越多、良好、稳定的节点越多，查找速度就越快。分解大部分引导过程(以获得更多同类比较)的一种方法是在获得 dht_bootstrap_alert 之后开始计时。 (并且在此之前不要添加磁力链接)。

添加 dht 引导节点将主要使其可能进行引导，但它仍然不一定会特别快。您通常需要大约 270 个节点左右(包括替换节点)被视为引导。

为了加快引导过程，您可以做的一件事是确保 save and load session 状态，其中包括dht routing table 。这会将上一个 session 中的所有节点重新加载到路由表中，并且(假设您没有更改 IP 并且一切正常)引导应该会更快。

确保您不在 session constructor 中启动 DHT (作为flags参数，只需传入add_default_plugins)，load the state ，添加路由器节点，然后 start the dht .

不幸的是，要使其在内部工作，涉及很多移动部件，顺序很重要，并且可能存在微妙的问题。

另外，请注意，保持 DHT 持续运行会更快，因为重新加载状态仍然会通过 Bootstrap ，它只会有更多的节点预先进行 ping 并尝试“连接”。

禁用 start_default_features 标志还意味着 UPnP 和 NAT-PMP 将不会启动，如果您使用它们，则必须 start也可以手动进行。

关于python - 为什么 utorrents Magnet 到 Torrent 文件的获取速度比我的 python 脚本更快？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32753998/

文章推荐： mysql - 内部连接函数重复内容

文章推荐： C#/Visual Studio - 团队资源管理器扩展

文章推荐： Python正则表达式从末尾匹配mac地址？

文章推荐： c# - 从 View 组件中获取 session ID

windows - FFMPEG - 加速视频延时 - 更快/更快？
好吧，我知道这个问题已经被问了无数次了。但是，对于我在谷歌搜索中似乎无法找到的问题，我还有一个小补充。我当然不是 FFMPEG 的专家……我一直在使用 FFMPEG 的标准加速/减速模板，我正在使用
CouchDB - 参数和 View - 幕后发生了什么，它是否比临时 View 更快/更快？
考虑这三个文档... [ { _id: "...", _rev: "...", title: "Foo", body: "...
c# - 在 App_Code 的类中使用 const 是否比在 webforms asp.net 应用程序的 config.web 中使用 appSettings 更快/更快？
我想知道访问我的全局变量的最快方法...它们只会在 Beta 测试阶段发生变化。在我们上线之前。从那时起，它们将永远不会改变。我认为从 web.config 中获取内容会产生开销，而且编写 App.
SQL:BETWEEN 和 IN(更快)
这个问题在这里已经有了答案: 11 年前关闭。 Possible Duplicate: Is there a performance difference between BETWEEN and IN
oracle - 全局分区索引是否比非分区索引更好(更快)？
我很想知道对通常作为查询目标的数字列进行分区是否有性能优势。目前我有一个包含约 5000 万条记录的物化 View 。当使用常规 b 树索引并按此数字列搜索时，我得到的成本为 7，查询结果大约需要 0
java - 更快/更好的方式如何进行多个远程内容获取
我需要编写一个库，它执行许多远程 HTTP 调用来获取内容。我可以按照描述做here ，但是有没有更好的方法(在性能方面)如何做到这一点？如果我按照示例中所述进行操作，我总是会创建一个 URL 对象，
javascript - 如何使这个谷歌表格脚本代码更短/更快？
该代码非常不言自明。只是有很多我需要独立随机化的范围。例如，范围('W1:W4')不应与范围('W5:W8')混淆，因此我不能只是随机化范围('W1:W80')。任何帮助或建议都会很棒!多谢。目前，代
android:哪个是最好的模拟器配置？更快
我正在使用 ADT 模拟器。我在我的模拟器中使用默认的 Android 虚拟设备。我创建了一个版本 4.0.3。问题太慢了。有时我在尝试更改 fragment 时会收到加载点击。我使用的代码是有
php - 更快/更轻松地查询此结果数组
我正在尝试获取一个包含三个表中的信息的数组。结果应该是一个数组，我可以在其中循环遍历第一个表、第二个表中的相关行以及第三个表到第二个表中的相关行。目前，我有三个独立的 SQL 查询，然后将它们重组为一
ios - 哪种方式在服务器上上传图片更好(更快)
我已经学会了两种在服务器上上传图像的方法(可能还有更多..)。 1) 创建 NSData 并将其添加到请求正文中 2)创建字节数组并像简单数组一样以json形式发送 1) 创建 NSData 并将其添
ios - ViewDidAppear 更快
我有一个 UItextview，我可以在里面写入数据类，我可以在我的 View 中的任何地方提供数据，在 ViewDidAppear 函数中我传递了我的数据，但它有点慢。文本在 0.2-0.3 秒后出
ios - discoverAllContactUserInfosWithCompletionHandler 更快？
如何为 discoverAllContactUserInfosWithCompletionHandler 创建优先级高于默认值的 CKOperation？我找不到不使用 [[CKContainer
linux - 调用内核级函数比clock()更快
我在 unix 模块下编写了一个内核级函数，用于对系统负载进行采样。我在 clock.c 下的 clock() 中调用示例函数，以在每个时钟(例如，我的系统上每 10 毫秒)拍摄系统负载的快照。有没有
c++ - ReadProcessMemory 更快
我正在制作一个应用程序，该应用程序将根据变量的值使用鼠标/键盘(宏)模拟操作。这里有我制作的 de 扫描代码: void ReadMemory(int value){ DWORD p
javascript - 为什么调用嵌套在函数对象中的函数比...更快？
我想知道在计算上调用嵌套在对象中的函数的最快方法是什么，所以我做了一个快速的 jsPerf.com 基准测试，其中我考虑了三种可能性——从数组中调用函数，从“核心”中调用函数对象和函数对象: var
php - 调用缓存图像时哪个更有效/更快？
我用 php 做了一个图像缩放器。调整图像大小时，它会缓存一个具有新尺寸的新 jpg 文件。下次您调用确切的 img.php?file=hello.jpg&size=400 时，它会检查是否已经创建了
C#，结构与类，更快？
这个问题在这里已经有了答案: 关闭 11 年前。 Possible Duplicate: Which is best for data store Struct/Classes? 考虑我有一个 Em
r - 为多组列动态创建行的替代(更快)方法
我正在尝试为多组列自动计算每行的平均分数。例如。一组列可以代表不同比例的项目。这些列也被系统地命名 (scale_itemnumber)。例如，下面的虚拟数据框包含来自三个不同比例的项目。(可能会出
java - 是什么让 hashmap 更快？
所以我知道散列图使用桶和散列码等等。根据我的经验，Java 哈希码并不小，但通常很大，所以我假设它没有在内部建立索引。除非哈希码质量很差导致桶长度和桶数量大致相等，否则 HashMap 比名称-> 值
java - 如何使用多线程使慢速 "for loop"更快？
假设我有一个非常缓慢和大的 for 循环。如何将其拆分为多个线程以使其运行速度更快？ for (int a = 0; a { slowMet

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 为什么 utorrents Magnet 到 Torrent 文件的获取速度比我的 python 脚本更快？