java - 在MapReduce作业的Reducer中通过Text输入值进行多次迭代-6ren

java - 在MapReduce作业的Reducer中通过Text输入值进行多次迭代

转载作者：行者123 更新时间：2023-12-02 21:45:38

25

4

我在HDFS上有两个非常大的数据集(表)。我想在某些列上将它们加入，然后在某些列上将它们分组，然后在某些列上执行某些组功能。
我的步骤是:

1- Create two jobs.

2- In the first job, in mapper read the rows of each dataset as mapinput value and emit join columns' values as map output key andremaining columns' values as map output value.

After mapping, the MapReduce framework performs shuffling and groupsall the map output values according to map output keys.

Then, in reducer it reads each map output key and its values which man includemany rows from both datasets.

What I want is to iterate through reduce input value many times so that I can perform cartesian product.

To illustrate:

Let's say for a join key x, I have 100 matches from one dataset and200 matches from the other. It means joining them on join key xproduces 100*200 = 20000 combination. I want to emit NullWritable asreduce output key and each cartesianproduct as reduce output value.

An example output might be:

for join key x:

From (nullWritable),(first(1),second(1))

Over (nullWritable),(first(1),second(200))

To (nullWritable),(first(100),second(200))

How can I do that?

I can iterate only once. And I could not cash the values because they dont fit into memory.

3- If I do that, I will start the second job, which takes the firstjob's result file as input file. In mapper, I emit group columns'values as map output key, and the remaining columns' values as mapoutput value. Then in reducer by iterating through each key's value, Iperform some functions on some columns like sum, avg, max, min.

非常感谢。

最佳答案

由于您的第一个MR作业使用join键作为映射输出键，因此您的第一个reduce程序将为每个reduce调用获取(K join_key，List 值)。您可以做的就是将值分成两个单独的列表，每个列表用于一个数据源，然后使用嵌套的for循环执行笛卡尔积。

关于java - 在MapReduce作业的Reducer中通过Text输入值进行多次迭代，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25587811/

25

4

0

文章推荐： hadoop - 从 pig 的生产线中提取

文章推荐： c# - C#编译器会优化空if block 吗

文章推荐： awk:当记录中的第一个字段相同时并排打印行

文章推荐： hadoop - hive /Hadoop中的警报通知

java 正则表达式匹配 &[text(text - text text) !text]
我目前正在创建一个正则表达式来拆分所有匹配以下格式的字符串:&[text(text - text text) !text]。这里的文本实际上可以是任何字符。并且间距很重要。文本将如图所示列出。我已经
javascript - 使用正则表达式将 (,,,text,,4,text,3,,) 转换为 (text,4,text,3)
这个问题在这里已经有了答案: Remove duplicate commas and extra commas at start/end with RegExp in Javascript, and
Python xml 迷你。生成 Some text 元素
我有以下代码。 from xml.dom.minidom import Document doc = Document() root = doc.createElement('root') doc.a
javascript - 如何使用 jQuery :contains(some text) selector but only select "some text" from "this is some text"?
这个问题在这里已经有了答案: 关闭 10 年前。 Possible Duplicate: Find text string in jQuery and make it bold 如何使用 jQuer
javascript - libmagic。 text/plain 而不是 text/javascript text/css
我使用 libmagic 在我的元素的 Web 界面中获取文件的 MIME 类型。我在 css 和 js 文件上得到文本/纯 mime 类型。例如 chromium 显示以下警告: Resource
html - 如何设置
s inline : text, img, text, text
起初我必须阅读很多教程，但我仍然不知道我做错了什么...... 我想内联使用 4 个 div。在我想放置的那些 div 中:文本、图像、文本、文本。我希望中间文本自动设置为最大宽度。我写了一个简单的
javascript - 替换每次出现的 [b : "text"] to text where text can be anything
我想替换所有出现的 [b: "text"]至text使用 JavaScript 和 RegEx。目前我知道如何替换 [b: ""]至使用'/\[b: ""\]/g'但我不知道如果 " 之间有文本该怎么
text - 使用 text() 向绘图添加文本的替代方法
这可能是一个幼稚的问题，但我想知道是否有比使用 text() 更好的方法将文本添加到绘图中。注意，我也在使用 layout()以及。具体来说，我有一个情节的一部分，我想在其中添加一些带有标题的文本，然
text - 批量查找并替换Sublime Text 2
我必须反复从 latex 源粘贴代码，因此每次都必须做很多查找和替换操作('“a'=>'ä'，'” o'=>'ö'，...) 。有没有一种方法可以存储这些搜索和替换规则，例如，我可以通过一次按键执行
text - 为什么在编写代码时Sublime Text 3不会跳行？
当我在Sublime Text 3代码屏幕中编写代码时，它连续地向右滑动，如图所示。我该怎么办？请注意第10行。最佳答案如果您只想为当前 View (正在编辑的当前文件)激活自动换行，只需vie
text - Sublime Text 字体目录
是否有可能更改 sublime text 中的默认字体目录？我只想使用可移植 sublime 文本存储在我的 pendrive 上的字体，这样我就不必在我使用可移植 sublime 文本的每台机器上安
"text"框旁边的Android "Text Field"
我是 Android 开发的新手，我有一个愚蠢的问题。如何将“文本字段”框放在一行中的文本旁边。例子: Please Enter the number: [ ] 关于 "t
c# - 用打印引号替换直引号 : "My text" to „My text“
我想自动将“我的文本”更改为“我的文本”，因为这是用德语写的正确方式。引号可以在文本中的任何位置。有没有一种简单的方法可以实现这一点？解决方案应该检查第一个字符，最后一个字符，比如“this”，或
silverlight - 使用 XAML 和文本 Text ="Some text {Some binding} some more text}"进行内联绑定(bind)的最佳实践
我想知道是否有特殊的语法来绑定(bind)与现有文本连接的文本。像这样。显然，这行不通。什么是最佳实践？使用 SL4。最佳答案使用StringFormat在 Binding 上。 WPF
javascript - console.log ('true text' || 很明显吗？真的？ 'text' : 'text1' ); logs 'text' ?
我认为它应该打印“真实文本”，因为它相当于 true console.log('true text' || true ? 'text' : 'text1'); 但是，输出是“文本”；抱歉，如果是愚蠢的
javascript - break text with css (text == white space == text) float 文本，文本中断
有没有办法通过 css 打破文本，以便中间有一个“空白”？目前我正在通过手工打破文本来解决这个问题 -但这是愚蠢的。我知道有一个函数可以让文本在另一个 div 中结束和开始，但 IE 不支持它。文本
text - Tcl/Tk : highlight some line in text widget or change the color for specific line text
我想为我的Tcl/Tk工具实现一个效果:在text控件中，根据具体情况，希望高亮一些线条的背景色，其他线条正常透明.有可能吗？我尝试了一些选项，例如:-highlightbackground 、-i
python - 当 'text' 可能包含更多 {{ text }} block 时，如何用 re.sub() 替换表达式 {{ text }} ？
我正在尝试解析原始维基百科文章内容，例如the article on Sweden ，使用re.sub()。但是，我在尝试替换 {{some text}} block 时遇到了问题，因为它们可以包含更
c# - 单声道 GTK# : Trying to remove text in ComboBox and then prepend new text to the ComboBox but some of the old text remains
我试图先删除 ComboBox 中的所有内容。然后在其前面添加文本，但保留了一些旧文本。有没有办法重置或清除 ComboBox？或者我怎样才能最好地实现这一目标？ public void GetBad
python - spaCy (v3.0) `nlp.make_doc(text)` 和 `nlp(text)` 之间的区别？为什么训练时要用 `nlp.make_doc(text)`？
我知道我们应该创建 Example对象并将其传递给 nlp.update() 方法。根据 docs 中的示例, 我们有 for raw_text, entity_offsets in train_da

首页

博学

6Ren·AI

商城

java - 在MapReduce作业的Reducer中通过Text输入值进行多次迭代