gpt4 book ai didi

scribe - 水槽 vs 卡夫卡 vs 其他

转载 作者:行者123 更新时间:2023-12-03 12:50:19 28 4
gpt4 key购买 nike

就目前而言,这个问题不适合我们的问答形式。我们希望答案得到事实、引用资料或专业知识的支持,但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开,visit the help center寻求指导。




9年前关闭。




可能以前有人问过这个问题,但鉴于这些技术已经成熟,我认为今天再次考虑是件好事。我们希望使用 Flume、kafka、scribe 或其他之一将流式 facebook 和 twitter 个人资料信息存储到 hbase 中,以便稍后进行分析。我们正在考虑使用水槽,但我没有使用其他技术来做出明智的决定。任何能够揭示一些光芒的人都会很棒!非常感谢。

最佳答案

Mediawiki(维基百科)经历了这一点,并发表了一篇很好的文章,介绍了他们如何选择(Kafka)与 Scribe、Flume 等。
http://www.mediawiki.org/wiki/Analytics/Kraken/Request_Logging
新链接:
https://wikitech.wikimedia.org/wiki/Analytics/Archive/Hadoop_Logging_-_Solutions_Recommendation
后人总结:

"Our recommendation is Apache Kafka, a distributed pub-sub messaging system designed for throughput. We evaluated about a dozen[1] best-of-breed systems drawn from the domains of distributed log collection, CEP / stream processing, and real-time messaging systems. While these systems offer surprisingly similar features, they differ substantially in implementation, and each is specialized to a particular work profile (a more thorough technical discussion is available as an appendix).

"Kafka stands out because it is specialized for throughput and explicitly distributed in all tiers of its architecture. Interestingly, it is also concerned enough with resource conservation[2] to offer sensible tradeoffs that loosen guarantees in exchange for performance — something that may not strike Facebook or Google as an important feature in the systems they design. Constraints breed creativity.

"In addition, Kafka has several perks of particular interest to Operations readers. While it is written in Scala, it ships with a native C++ producer library that can be embedded in a module for our cache servers, obviating the need to run the JVM on those servers. Second, producers can be configured to batch requests to optimize network traffic, but do not create a persistent local log which would require additional maintenance. Kafka's I/O and memory usage is left up to the OS rather than the JVM[3].

"Kafka was written by LinkedIn and is now an Apache project. In production at LinkedIn, approximately 10,000 producers are handled by eight Kafka servers per datacenter. These clusters consolidate their streams into a single analytics datacenter, which Kafka supports out of the box via a simple mirroring configuration.

"These features are a very apt fit for our intended use cases; even those we don't intend to use — such as sharding and routing by "topic" categories — are interesting and might prove useful in the future as we expand our goals.

"The rest of this document dives into these topics in greater detail..."

关于scribe - 水槽 vs 卡夫卡 vs 其他,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12559570/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com