python - Apache 节俭 : Multitask single Server and Client-6ren

python - Apache 节俭 : Multitask single Server and Client

转载作者：太空宇宙更新时间：2023-11-04 05:12:48

25

4

我读过 this和 this .但是，我的情况不同。我不需要服务器上的多路复用服务，也不需要与服务器的多个连接。

背景:
对于我的大数据项目，我需要计算给定大数据的核心集。Coreset 是大数据的一个子集，保留了大数据最重要的数学关系。

工作流程:

将大数据分割成更小的 block
客户端解析chunk并发送给服务器
服务器计算核心集并保存结果

我的问题:
整个过程作为单线程执行。客户端解析一个 block ，然后等待服务器完成核心集的计算，然后解析另一个 block ，依此类推。

目标:
利用多处理。客户端同时解析多个 block ，对于每个 compute coreset 请求，服务器都会分配一个线程来处理它。线程数量有限的地方。类似于游泳池的东西。

我知道我需要使用与 TSimpleServer 不同的协议(protocol)，并转向 TThreadPoolServer 或 TThreadedServer。我只是无法确定选择哪一个，因为两者似乎都不适合我？

TThreadedServer spawns a new thread for each client connection, and each thread remains alive until the client connection is closed.

In TThreadedServer each client connection gets its own dedicated server thread. Server thread goes back to the thread pool after client closes the connection for reuse.

我不需要每个连接一个线程，我想要一个连接，并且服务器同时处理多个服务请求。 可视化:

Client:
Thread1: parses(chunk1) --> Request compute coreset
Thread2: parses(chunk2) --> Request compute coreset
Thread3: parses(chunk3) --> Request compute coreset

Server: (Pool of 2 threads)
Thread1: Handle compute Coreset
Thread2: handle compute Coreset
.
. 
Thread1 becomes available and handles another compute coreset

代码:
api.thrift:

struct CoresetPoint {
    1: i32 row,
    2: i32 dim,
}

struct CoresetAlgorithm {
    1: string path,
}

struct CoresetWeightedPoint {
    1: CoresetPoint point,
    2: double weight,
}

struct CoresetPoints {
    1: list<CoresetWeightedPoint> points,
}

service CoresetService {

    void initialize(1:CoresetAlgorithm algorithm, 2:i32 coresetSize)

    oneway void compressPoints(1:CoresetPoints message)

    CoresetPoints getTotalCoreset()
}

服务器:(为了更好看，删除了实现)

class CoresetHandler:
    def initialize(self, algorithm, coresetSize):

    def _add(self, leveledSlice):

    def compressPoints(self, message):

    def getTotalCoreset(self):


if __name__ == '__main__':
    logging.basicConfig()
    handler = CoresetHandler()
    processor = CoresetService.Processor(handler)
    transport = TSocket.TServerSocket(port=9090)
    tfactory = TTransport.TBufferedTransportFactory()
    pfactory = TBinaryProtocol.TBinaryProtocolFactory()

    server = TServer.TThreadedServer(processor, transport, tfactory, pfactory)

    # You could do one of these for a multithreaded server
    # server = TServer.TThreadedServer(processor, transport, tfactory, pfactory)
    # server = TServer.TThreadPoolServer(processor, transport, tfactory, pfactory)

    print 'Starting the server...'
    server.serve()
    print 'done.'

客户:

try:
    # Make socket
    transport = TSocket.TSocket('localhost', 9090)

    # Buffering is critical. Raw sockets are very slow
    transport = TTransport.TBufferedTransport(transport)

    # Wrap in a protocol
    protocol = TBinaryProtocol.TBinaryProtocol(transport)

    # Create a client to use the protocol encoder
    client = CoresetService.Client(protocol)

    # Connect!
    transport.open()


    // Here data is sliced, and in a loop I move on all files 
       Saved in the directory I specified, then they are parsed and
       client.compressPoints(data) is invoked.

       SliceFile(...)
       p = CoresetAlgorithm(...)
       client.initialize(p, 200)
       for filename in os.listdir('/home/tony/DanLab/slicedFiles'):
           if filename.endswith(".txt"):
               data = _parse(filename)
               client.compressPoints(data)
       compressedData = client.getTotalCoreset()


# Close!
    transport.close()

except Thrift.TException, tx:
    print '%s' % (tx.message)

问题:Thrift 有可能吗？我应该使用什么协议(protocol)？我通过在函数声明中添加 oneway 解决了客户端等待服务器完成计算的部分问题to表示客户端只发出请求，根本不等待任何响应。

最佳答案

从本质上讲，这更像是一个架构问题，而不是 Thrift 问题。鉴于前提

I don't need a thread per connection, I want a single connection, and the server to handle multiple service requests the same time. Visiualization:

和

I solved the partial problem of client waiting for server to finish computation by adding oneway to function declaration to indicates that the client only makes a request and does not wait for any response at all.

准确地描述了用例，你想要这样:

+---------------------+
| Client              |
+---------+-----------+
          |
          |
+---------v-----------+
| Server              |
+---------+-----------+
          |
          |
+---------v-----------+          +---------------------+
| Queue<WorkItems>    <----------+ Worker Thread Pool  |
+---------------------+          +---------------------+

服务器唯一的任务是获取请求并尽快将它们插入工作项队列。这些工作项由独立的工作线程池处理，否则完全独立于服务器部分。唯一共享的部分是工作项队列，这当然需要正确同步的访问方法。

关于 serevr 的选择:如果服务器足够快地处理请求，甚至 TSimpleServer 也可以。

关于python - Apache 节俭 : Multitask single Server and Client，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42574390/

25

4

0

文章推荐： ruby-on-rails - ‘ bash ’: no such file or directory

文章推荐： Python 捕获 stdout/stderr 并在查看输出时记录到文件

文章推荐： python - 从字典中的类实例中提取值

ruby - 节俭 ruby
我这辈子都无法成功运行“gem install thrift”，在构建 gem 的 native 扩展时失败了；这是输出: (acib708) ~ -> gem install thrift Buil
asynchronous - 节俭 + Tornado + 异步
关闭。这个问题需要更多 focused .它目前不接受答案。想改进这个问题？更新问题，使其仅关注一个问题 editing this post . 4年前关闭。 Improve this questi
rpc - 节俭发布订阅
我正在评估 thrift 作为 rpc 框架。我希望能够使用 thrift 执行发布/订阅逻辑，并且想知道如何做到这一点。一些不同的答案可能会有所帮助: 有没有规范的方式来发布/订阅节俭？有没有办
java - 声明一个接口(interface)节俭
如何在 Thrift 中声明接口(interface)？我的意思是我有接口(interface) IClient，我将它用作服务器中登录函数的参数: public interface IServer
PHP 和 python 节俭
我是 thrift apache 的新手。我正在使用 API edenremote。当 thrift 调用函数 readMessageBegin 时，它进入循环，我没有收到任何响应，请求正在进行中请
节俭 IDL : service response as list
在thrift IDL中，服务响应也可以是list或map吗？因为，通常我看到它是一些结构或一些原始类型，如 string、double 等。另外，我可以验证的来源是什么？还请注明出处。最佳答案
php - Cassandra 节俭 : append data
如果我需要将数据(而不是插入) append 到特定的 super 列中，我该怎么办？例如: 考虑下面描述的现有记录 Kespace : test columFamily: testColum Su
java - Apache 节俭 : Serializing data
RELATED QUESTION: Can I directly serialize to a file using PHP's thrift library? 我必须使用apache thrift来
ssl - 节俭 ssl python 服务器
我想保护 thrift 服务器(只是加密，并将使用 acl 进行简单的身份验证)，发现这个:http://architects.dzone.com/articles/how-secure-and-ap
list - Apache 节俭 : Returning a list/container
我制作了一个像这样的简单的节俭文件: thrifttest.thrift namespace cpp thrifttest namespace d thrifttest namespace java
scala - Finagle + 节俭 : Count method invocations
我正在开发一个微服务系统，该系统是在 Scala 中以 Finagle 和 Thrift 作为平台实现的。由于有一些服务已经有一段时间没有人接触了，我需要查明它们是否已经被使用(或者更确切地说，哪些
java - 节俭 : Unable to get a generator for "python"
我的机器上安装了 thrift(Ubuntu 12.04)。我使用的 thrift 版本是 0.9.0。我尝试为 thrift 接口(interface)文件生成 python 文件，如下所示 thr
java - 节俭 : Generating a char array for Java
根据 the thread about Strings and security in java , String 类型在用于密码属性时可能很危险，主要是因为字符串是不可变的(可以在 VM 镜像中找到
php - Apache 节俭 : client timeout issues
我有一些带有 perl-server 和 php-client 的 Apache Thrift (v.0.6.1) 测试应用程序。我无法解释的行为:如果我们使用无效参数调用服务器方法，我们会在服务器
HBase 节俭 : how to connect to remote HBase master/cluster?
感谢 Cloudera 发行版，我在本地机器上运行了一个 HBase master/datanode + Thrift 服务器，并且可以编写和测试 HBase 客户端程序并使用它，没问题。但是，我现
python - 节俭 python - TApplicationException : Invalid method name
我在同一个 thrift 文件中定义了 2 个服务并共享一个端口。我可以使用 serviceA 中的任何方法没问题，但每当我尝试调用 ServiceB 的任何方法时，我都会遇到异常。这是我的节俭文件
python - Apache 节俭 : Multitask single Server and Client
我读过 this和 this .但是，我的情况不同。我不需要服务器上的多路复用服务，也不需要与服务器的多个连接。背景: 对于我的大数据项目，我需要计算给定大数据的核心集。Coreset 是大数据的一
java - Apache 节俭，Java : Object Data Types
关于数据类型，我坚持使用 Thrift。现在，当我将 Integer 值映射到 thrift 生成的 bean 时，我在 idl 定义中使用了 i32 类型。 class MyBean { In
c# - 节俭.Transport.TTransportException : Cannot write to null outputstream
我正在使用 Cassandra 和 Thrift 库做一些工作。我意识到这些是非常早期的库，并且(毫无疑问)会在某个时候发生变化。我一直在使用以下 link寻求帮助设置我的 C# 代码以写入和读取我
struct - Apache 节俭 : Assign default values to enclosed struct
在 IDL 文件中，我有 struct CaseInfo { 1: CaseID = '', 2: EvID = 'foobar', } struct Case { 1: Ca

首页

博学

6Ren·AI

商城

python - Apache 节俭 : Multitask single Server and Client