python - "Can' t pickle <type '_csv.reader' >”在 Windows 上使用多处理时出错-6ren

python - "Can' t pickle ”在 Windows 上使用多处理时出错

转载作者：太空狗更新时间：2023-10-30 01:09:55

我正在使用 Windows 编写一个多处理程序来并行处理大型 .CSV 文件。

我找到了 this excellent example对于类似的问题。在 Windows 下运行时，我收到一条错误消息，指出 csv.reader 不可 Picklable。

我想我可以在读取器子进程中打开 CSV 文件，然后将文件名从父进程发送给它。但是，我想传递一个已打开的 CSV 文件(就像代码应该做的那样)，具有特定的状态，即真正使用共享对象。

知道如何在 Windows 下执行此操作或那里缺少什么吗？

这是代码(为了便于阅读，我重新发布):

"""A program that reads integer values from a CSV file and writes out their
sums to another CSV file, using multiple processes if desired.
"""

import csv
import multiprocessing
import optparse
import sys

NUM_PROCS = multiprocessing.cpu_count()

def make_cli_parser():
    """Make the command line interface parser."""
    usage = "\n\n".join(["python %prog INPUT_CSV OUTPUT_CSV",
            __doc__,
            """
ARGUMENTS:
    INPUT_CSV: an input CSV file with rows of numbers
    OUTPUT_CSV: an output file that will contain the sums\
"""])
    cli_parser = optparse.OptionParser(usage)
    cli_parser.add_option('-n', '--numprocs', type='int',
            default=NUM_PROCS,
            help="Number of processes to launch [DEFAULT: %default]")
    return cli_parser

class CSVWorker(object):
    def __init__(self, numprocs, infile, outfile):
        self.numprocs = numprocs
        self.infile = open(infile)
        self.outfile = outfile
        self.in_csvfile = csv.reader(self.infile)
        self.inq = multiprocessing.Queue()
        self.outq = multiprocessing.Queue()

        self.pin = multiprocessing.Process(target=self.parse_input_csv, args=())
        self.pout = multiprocessing.Process(target=self.write_output_csv, args=())
        self.ps = [ multiprocessing.Process(target=self.sum_row, args=())
                        for i in range(self.numprocs)]

        self.pin.start()
        self.pout.start()
        for p in self.ps:
            p.start()

        self.pin.join()
        i = 0
        for p in self.ps:
            p.join()
            print "Done", i
            i += 1

        self.pout.join()
        self.infile.close()

    def parse_input_csv(self):
            """Parses the input CSV and yields tuples with the index of the row
            as the first element, and the integers of the row as the second
            element.

            The index is zero-index based.

            The data is then sent over inqueue for the workers to do their
            thing.  At the end the input thread sends a 'STOP' message for each
            worker.
            """
            for i, row in enumerate(self.in_csvfile):
                row = [ int(entry) for entry in row ]
                self.inq.put( (i, row) )

            for i in range(self.numprocs):
                self.inq.put("STOP")

    def sum_row(self):
        """
        Workers. Consume inq and produce answers on outq
        """
        tot = 0
        for i, row in iter(self.inq.get, "STOP"):
                self.outq.put( (i, sum(row)) )
        self.outq.put("STOP")

    def write_output_csv(self):
        """
        Open outgoing csv file then start reading outq for answers
        Since I chose to make sure output was synchronized to the input there
        is some extra goodies to do that.

        Obviously your input has the original row number so this is not
        required.
        """
        cur = 0
        stop = 0
        buffer = {}
        # For some reason csv.writer works badly across threads so open/close
        # and use it all in the same thread or else you'll have the last
        # several rows missing
        outfile = open(self.outfile, "w")
        self.out_csvfile = csv.writer(outfile)

        #Keep running until we see numprocs STOP messages
        for works in range(self.numprocs):
            for i, val in iter(self.outq.get, "STOP"):
                # verify rows are in order, if not save in buffer
                if i != cur:
                    buffer[i] = val
                else:
                    #if yes are write it out and make sure no waiting rows exist
                    self.out_csvfile.writerow( [i, val] )
                    cur += 1
                    while cur in buffer:
                        self.out_csvfile.writerow([ cur, buffer[cur] ])
                        del buffer[cur]
                        cur += 1

        outfile.close()

def main(argv):
    cli_parser = make_cli_parser()
    opts, args = cli_parser.parse_args(argv)
    if len(args) != 2:
        cli_parser.error("Please provide an input file and output file.")

    c = CSVWorker(opts.numprocs, args[0], args[1])

if __name__ == '__main__':
    main(sys.argv[1:])

在 Windows 下运行时，这是我收到的错误:

Traceback (most recent call last):
  File "C:\Users\ron.berman\Documents\Attribution\ubrShapley\test.py", line 130, in <module>
    main(sys.argv[1:])
  File "C:\Users\ron.berman\Documents\Attribution\ubrShapley\test.py", line 127, in main
    c = CSVWorker(opts.numprocs, args[0], args[1])
  File "C:\Users\ron.berman\Documents\Attribution\ubrShapley\test.py", line 44, in __init__
    self.pin.start()
  File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Python27\lib\multiprocessing\forking.py", line 271, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Python27\lib\multiprocessing\forking.py", line 193, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 419, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 681, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\multiprocessing\forking.py", line 66, in dispatcher
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 401, in save_reduce
    save(args)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 548, in save_tuple
    save(element)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 419, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 681, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 396, in save_reduce
    save(cls)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 753, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <type '_csv.reader'>: it's not the same object as _csv.reader
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 374, in main
    self = load(from_parent)
  File "C:\Python27\lib\pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "C:\Python27\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "C:\Python27\lib\pickle.py", line 880, in load_eof
    raise EOFError
EOFError

最佳答案

您遇到的问题是由于使用 CSVWorker 类的方法作为进程目标引起的；那个类(class)有不能腌制的成员；那些打开的文件永远不会工作；

您要做的是将该类分成两个类；一个协调所有工作子进程，另一个实际执行计算工作。工作进程将文件名作为参数并根据需要打开单个文件，或者至少等到他们调用了工作方法并打开文件时才打开文件。它们还可以将 multiprocessing.Queue 作为参数或实例成员；这是安全的。

在某种程度上，你已经有点这样做了；您的 write_output_csv 方法正在子进程中打开文件，但是您的 parse_input_csv 方法期望找到一个已经打开和准备好的文件作为 self< 的属性。坚持以相反的方式进行，您应该会保持良好的状态。

关于python - "Can' t pickle <type '_csv.reader' >”在 Windows 上使用多处理时出错，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8514238/

文章推荐： python - 是否可以连接 QuerySet？

文章推荐： c# - 从 C# 导出 SQL Server 数据库并重新导入

文章推荐： c# - 如何将二进制转换为字节并在c#中作为文件写入

PyTorch:type(a)、a.type、a.type() 之间的区别
假设a是张量，那么有什么区别: 类型(a) a.类型 a.type() 我找不到区分这些的文档。最佳答案 type 是 python 内置方法。它将返回对象的类型。喜欢 torch.Tensor.
dependent-type - `Type 1` 既不是 `Type` 也不是 `Type` 的居民的示例
什么是 Type 1 的居民的例子？两者都不是 Type也不是Type的居民?在 Idris REPL 中进行探索时，我无法想出任何东西。更准确地说，我正在寻找一些 x除了 Type产生以下结果:
abap - 什么是 : TYPE, TYPES、TYPE-POOL、TYPE-POOLS 和类型组？
我找到了一些资源，但我不确定我是否理解。我找到的一些资源是: http://help.sap.com/saphelp_nw70/helpdata/en/fc/eb2ff3358411d1829f00
c++ - 函数指针的 Type(f)(Type) 和 Type(*f)(Type) 之间的区别？
这两个函数原型(prototype)有什么区别？ void apply1(double(f)(double)); void apply2(double(*f)(double)); 如果目标是将提供的函
types - 去戈兰 : Type assertion on customized type
http://play.golang.org/p/icQO_bAZNE 我正在练习使用堆进行排序，但是 prog.go:85: type bucket is not an expression
Replace Generic Types In `System.Type[]` With Types(将`System.Type[]`中的泛型类型替换为类型)
假设有一个泛型定义的方法信息对象，即一个方法信息对象，这样的方法Info.IsGenericMethodDefinition==TRUE：。也可以说它们也有一个泛型参数列表：。我可以使用以下命令获取该
dependent-type - 在依赖类型的编程语言中，Type-in-Type 是否适用于编程？
在具有依赖类型的语言中，您可以使用 Type-in-Type 来简化语言并赋予它很多功能。这使得语言在逻辑上不一致，但如果您只对编程感兴趣而不对定理证明感兴趣，这可能不是问题。在 Cayenne
types - "static type"和 "dynamic type"怎么可能不同？
根据 Nim 手册，变量类型是“静态类型”，而变量在内存中指向的实际值是“动态类型”。它们怎么可能是不同的类型？我认为将错误的类型分配给变量将是一个错误。最佳答案 import typetrait
Swift 结构扩展 : 'Cannot convert return expression of type to return type '
假设您有以下结构和协议(protocol): struct Ticket { var items: [TicketItem] = [] } struct TicketItem { } prot
c# - 什么可能导致 Entity Framework 抛出消息为 "(some type) is neither a super-type nor a sub-type of (some other type)"的异常？
我正在处理一个 EF 问题，我发现它很难调试...以前，在我的系统中有一个表类型继承设置管理不同的用户类型 - 所有用户共有的一种根类型，以及大致基于使用该帐户的人员类型的几种不同的子类型。现在，我遇
ios - Realm iOS : Cannot Convert value of type 'Dogs.Type' to expected argument type 'T.Type'
这是我的 DBManager.swift import RealmSwift class DBManager { class func getAllDogs() -> [Dog] {
python - (215 :Assertion failed) type == CV_32FC1 || type == CV_32FC2 || type == CV_64FC1 || type == CV_64FC2 in function 'dft'
我正在尝试使用傅里叶校正图像中的曝光。这是我面临的错误 5 padded = np.log(padded + 1) #so we never have log of 0 6 g
c# - : The mapping of CLR type to EDM type is ambiguous because multiple CLR types match the EDM type 的建议
关闭。这个问题是opinion-based .它目前不接受答案。想要改进这个问题？更新问题，以便 editing this post 可以用事实和引用来回答它. 关闭 9 年前。 Improve
Swift 泛型错误 : Cannot convert value of type 'Type' to expected argument type 'Type<_>'
请考虑以下设置: protocol MyProcotol { } class MyModel: MyProcotol { } enum Result { case success(value:
python - 类型错误 : type 'types.GenericAlias' is not an acceptable base type
好吧，我将我的 python 项目编译成一个可执行文件，它在我的电脑上运行，但我将它发送给几个 friend 进行测试，他们都遇到了这个错误。我以前从未见过这样的错误。我使用 Nuitka 来编译代码
python - 值错误 : Type must be a sub-type of ndarray type
当我尝试训练我的模型时"ValueError: Type must be a sub-type of ndarray type"出现在 line x_norm=(np.power(x,2)).sum(
swift - 静态 Var 闭包返回 Type.Type 而不是 Type
我尝试在另一个类中打断、计数然后加入对象。所以我构建协议(protocol): typealias DataBreaker = () -> [Double] typealias DataJoiner
angular - npm types 或 typings 或 @type 或什么？
我正在使用 VS 2015 更新 3、Angular 2.1.2、Typescript 2.0.6 有人可以澄清什么是 typings 与 npm @types 以及本月很难找到的任何其他文档吗？或
与 bool Type.op_Equality (Type, Type) 的 Mono 兼容性
我正在考虑从 VS2010 更改为 Mono，因此我通过 MoMA 运行我的程序集，看看我在转换过程中可能遇到多少困难。在生成的报告中，我发现我不断收到此错误: bool Type.op_Equali
reactjs - typescript 如何混合动态([key : type]: type) and static typing for an interface
主要问题不太确定这是否可能，但由于我讨厌 Typescript 并且它使我的编码变得困难，我想我会问只是为了确定。 interface ISomeInterface { handler: ()

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - "Can' t pickle ”在 Windows 上使用多处理时出错