python - 这需要很长时间......我如何加快这本词典的速度？ (Python)-6ren

python - 这需要很长时间......我如何加快这本词典的速度？ (Python)

转载作者：IT老高更新时间：2023-10-28 13:22:48

    meta_map = {}
    results = db.meta.find({'corpus_id':id, 'method':method}) #this Mongo query only takes 3ms
    print results.explain()
    #result is mongo queryset of 2000 documents

    count = 0
    for r in results:
        count += 1
        print count
        word = r.get('word')
        data = r.get('data',{})
        if not meta_map.has_key(word):
            meta_map[word] = data
    return meta_map

由于某种原因，这 super 、 super 慢。

总共有 2000 个结果。下面是一个 result 文档的示例(来自 Mongo)。所有其他结果的长度都相似。

{ "word" : "articl", "data" : { "help" : 0.42454812322341984, "show" : 0.24099054286865948, "lack" : 0.2368313038407821, "steve" : 0.20491936823259457, "gb" : 0.18757527934987422, "feedback" : 0.2855335862138559, "categori" : 0.28210549642632016, "itun" : 0.23615623082085788, "articl" : 0.21378509220044106, "black" : 0.22720575131038662, "hidden" : 0.26172127252557625, "holiday" : 0.27662433827306804, "applic" : 0.1802411089325281, "digit" : 0.20491936823259457, "sourc" : 0.21909218369809863, "march" : 0.2632736571995878, "ceo" : 0.2153108869289692, "donat" : 1, "volum" : 0.2572042432755638, "octob" : 0.2802470156773559, "toolbox" : 0.2153108869289692, "discuss" : 0.26973295489368615, "list" : 0.3698592948408095, "upload" : 0.1802411089325281, "random" : 1, "default" : 0.33044754314072383, "februari" : 0.2899936154686609, "januari" : 0.25228424754983525, "septemb" : 0.1802411089325281, "page" : 0.24675067183234803, "view" : 0.20019523259334138, "pleas" : 0.2839965947961194, "mdi" : 0.2731217555354, "unsourc" : 0.2709524603813144, "direct" : 0.18757527934987422, "dead" : 0.22720575131038662, "smartphon" : 0.2839965947961194, "jump" : 0.3004203939398161, "see" : 0.33044754314072383, "design" : 0.2839965947961194, "download" : 0.19574598998663462, "home" : 0.3004203939398161, "event" : 0.651573574681647, "wikipedia" : 0.21909218369809863, "content" : 0.2471475889083912, "version" : 0.42454812322341984, "gener" : 0.3004203939398161, "refer" : 0.2188507485718582, "navig" : 0.27662433827306804, "june" : 0.2153108869289692, "screen" : 0.27662433827306804, "free" : 0.22720575131038662, "job" : 0.19574598998663462, "key" : 0.3004203939398161, "addit" : 0.22484486630589545, "search" : 0.2878804276884952, "current" : 0.5071530767683105, "worldwid" : 0.20491936823259457, "iphon" : 0.2230524329516571, "action" : 0.24099054286865948, "chang" : 0.18757527934987422, "summari" : 0.33044754314072383, "origin" : 0.2572042432755638, "softwar" : 0.651573574681647, "point" : 0.27662433827306804, "extern" : 0.22190187748860113, "mobil" : 0.2514880028687207, "cloud" : 0.18757527934987422, "use" : 0.2731217555354, "log" : 0.27662433827306804, "commun" : 0.33044754314072383, "interact" : 0.5071530767683105, "devic" : 0.3004203939398161, "long" : 0.2839965947961194, "avail" : 0.19574598998663462, "appl" : 0.24099054286865948, "disambigu" : 0.3195885490528538, "statement" : 0.2737499468972353, "namespac" : 0.3004203939398161, "season" : 0.3004203939398161, "juli" : 0.27243508666247285, "relat" : 0.19574598998663462, "phone" : 0.26973295489368615, "link" : 0.2178125232318433, "line" : 0.42454812322341984, "pilot" : 0.27243508666247285, "account" : 0.2572042432755638, "main" : 0.34870313981256423, "provid" : 0.2153108869289692, "histori" : 0.2714135089366041, "vagu" : 0.24875213214603717, "featur" : 0.24099054286865948, "creat" : 0.26645207330844684, "ipod" : 0.2230524329516571, "player" : 0.20491936823259457, "io" : 0.2447908314834019, "need" : 0.2580912994161046, "develop" : 0.27662433827306804, "began" : 0.24099054286865948, "client" : 0.19574598998663462, "also" : 0.42454812322341984, "cleanup" : 0.24875213214603717, "split" : 0.26973295489368615, "tool" : 0.2878804276884952, "product" : 0.42454812322341984, "may" : 0.2676701118192027, "assist" : 0.1802411089325281, "variant" : 0.2514880028687207, "portal" : 0.3004203939398161, "user" : 0.20491936823259457, "consid" : 0.27662433827306804, "date" : 0.2731217555354, "recent" : 0.24099054286865948, "read" : 0.2572042432755638, "reliabl" : 0.2388872270166464, "sale" : 0.22720575131038662, "ambigu" : 0.23482106920048526, "person" : 0.260801274024785, "contact" : 0.24099054286865948, "encyclopedia" : 0.2153108869289692, "time" : 0.2368313038407821, "model" : 0.24099054286865948, "audio" : 0.19574598998663462 }}

整个过程大约需要 15 秒...什么鬼？我怎样才能加快速度？ :)

编辑:我意识到当我在控制台中打印计数时，它会非常快地从 0 变为 101，然后卡住 10 秒，然后从 102 继续到 2000

这可能是 MongoDB 的问题吗？

编辑 2:我打印了下面查询的 Mongo EXPLAIN():

{u'allPlans': [{u'cursor': u'BtreeCursor corpus_id_1_method_1_word_1',
                u'indexBounds': {u'corpus_id': [[u'iphone', u'iphone']],
                                 u'method': [[u'advanced', u'advanced']],
                                 u'word': [[{u'$minElement': 1},
                                            {u'$maxElement': 1}]]}}],
 u'cursor': u'BtreeCursor corpus_id_1_method_1_word_1',
 u'indexBounds': {u'corpus_id': [[u'iphone', u'iphone']],
                  u'method': [[u'advanced', u'advanced']],
                  u'word': [[{u'$minElement': 1}, {u'$maxElement': 1}]]},
 u'indexOnly': False,
 u'isMultiKey': False,
 u'millis': 3,
 u'n': 2443,
 u'nChunkSkips': 0,
 u'nYields': 0,
 u'nscanned': 2443,
 u'nscannedObjects': 2443,
 u'oldPlan': {u'cursor': u'BtreeCursor corpus_id_1_method_1_word_1',
              u'indexBounds': {u'corpus_id': [[u'iphone', u'iphone']],
                               u'method': [[u'advanced', u'advanced']],
                               u'word': [[{u'$minElement': 1},
                                          {u'$maxElement': 1}]]}}}

这些是 mongo 集合的统计数据:

> db.meta.stats();
{
    "ns" : "inception.meta",
    "count" : 2450,
    "size" : 3001068,
    "avgObjSize" : 1224.9257142857143,
    "storageSize" : 18520320,
    "numExtents" : 6,
    "nindexes" : 2,
    "lastExtentSize" : 13893632,
    "paddingFactor" : 1.009999999999931,
    "flags" : 1,
    "totalIndexSize" : 368640,
    "indexSizes" : {
        "_id_" : 114688,
        "corpus_id_1_method_1_word_1" : 253952
    },
    "ok" : 1
}


> db.meta.getIndexes();
[
    {
        "name" : "_id_",
        "ns" : "inception.meta",
        "key" : {
            "_id" : 1
        },
        "v" : 0
    },
    {
        "ns" : "inception.meta",
        "name" : "corpus_id_1_method_1_word_1",
        "key" : {
            "corpus_id" : 1,
            "method" : 1,
            "word" : 1
        },
        "v" : 0
    }
]

最佳答案

代替

if not meta_map.has_key(word):

你应该使用

if word not in meta_map:

如果你不打算使用 data = r.get('data',{})，那么它是没有意义的。

不清楚你为什么要这样做 word = r.get('word') ...如果 r 中总是存在'word'，你应该使用word = r['word'];否则你应该在get之后测试word是否为None。

同样获取数据。

试试这个:

for r in results:
    word = r['word']
    if word not in meta_map:
         meta_map[word] = r['data']

无论如何，你引用的时间是巨大的......那里肯定有其他事情发生。我很想看到您的代码用于计时并计算 results 中的条目数。

关于python - 这需要很长时间......我如何加快这本词典的速度？ (Python)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6714438/

文章推荐： c++ - 带有 unique_ptr 的前向声明？

文章推荐： c++ - C/C++ 中的单引号、双引号和 sizeof ('a' )

c# - 为什么 Int64.MaxValue 很长？
这个问题在这里已经有了答案: Isn't an Int64 equal to a long in C#? (2 个答案) 关闭 9 年前。它不应该是一个整数类型吗？这样，一些使用 int 的函数
c# - 为什么 Stream.Position 很长
当我遇到一些我想知道的事情时，我正忙着解析一个二进制文件。 Stream.Position属性的类型为Int64或long。为什么是这样？因为流中的位置不能为负，所以使用 UInt64 不是更有意义吗
c - 在我调用函数来扫描某些内容后，我的程序无法继续运行？有人能找出问题所在吗？ (很长，但请帮忙!)
所以第一部分是我从用户那里获得输入，在本例中，输入是“1”作为从另一个函数接收的字符值。 printf ("\nPlease enter 1, 2, 3 or q: "); option =
php - json_encode() 期望参数 2 很长，字符串给定
我正在尝试使用以下代码从 REST 服务返回 JSON: $categories = $categoriesController->listAll(); if($categories){ hea
c - 我如何 printf 很长？这不应该工作吗？ %li
我阅读了文档，它说 long 是 %li，但打印输出返回为 -2147024891。是什么赋予了？最佳答案您甚至没有提供要打印的号码，但我猜您已经无意中发现了签名打印和未签名打印之间的区别。使用
php - mysqli::query() 期望参数 2 很长，给出字符串
我正在创建自定义购物车，我正在构建一个查询，该查询从检索我刚刚保存到购物车表中的 session_id 开始。我知道这个值被保存了，我在 mysql 命令行运行这个查询，它返回我需要的但我没有将值放入
ios - 如果 TextView 很长，则启用 ScrollView 滚动
我有一个包含 textView 的 scrollView。如果文本很长并且不适合屏幕，我想增加 textView 高度(我想我可以通过添加 NSLayoutConstraint outlet 并修改它
php - PDOStatement::fetchAll() 期望参数 1 很长，给定字符串？
我有一个基本的数据库处理程序类，其中有一个使用 PDO::FETCH_ASSOC 参数返回结果集的公共(public)方法: public function resultSet() { $th
android - 调用 PublishSubject.onNext() 和接收它之间的 Rx Interval 很长
在后台线程中，我调用 PublishSubject.onNext(); 并在主线程中通过 subscribe(PublishSubject.filter(message -> message.getI
security - 为什么 Amazon Web Services 的登录页面 URL 很长
我想知道为什么 Amazon Web Services 控制台登录页面有这么长的 url？为什么不只发布数据而不显示其中包含大量数据的冗长 url。以这种方式实现有什么充分的理由吗？最佳答案我认为
php - 错误 PDOStatement::bindParam() 期望参数 3 很长，给定的字符串
这个问题在这里已经有了答案: Can I mix MySQL APIs in PHP? (4 个答案) 关闭 6 年前。希望我犯了一个快速而明显的错误，我浏览了 previous question
java - 我该怎么办 session URL 很长，我无法附加 JSESSIONID=389729387392。解决方案是什么？
我得到了答案:如果我禁用了cookie，那么使用URL重定向我可以传递JSESSIONID，但我的URL已经很长，因为我使用它有约束的GET方法。那怎么办我应该使用我的 session 吗？我希望我的
php - DOMDocument::loadHTML() 期望参数 2 很长，尝试使用 LIBXML_HTML_NOIMPLIED 参数时给出的字符串
目前，当我使用 DOMDocument 对象并调用 saveHTML() 时，它会自动添加一些我不需要的 html 标签。我尝试了此处建议的解决方案 ( https://stackoverflow.c

IT老高

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 这需要很长时间......我如何加快这本词典的速度？ (Python)