python - Map-Reduce 使用 Hadoop 解决 python 中的矩阵乘法-6ren

python - Map-Reduce 使用 Hadoop 解决 python 中的矩阵乘法

转载作者：可可西里更新时间：2023-11-01 15:51:38

25

4

我想应用 map-reduce 来处理 python 和 Hadoop 中的矩阵乘法。目标是计算 A * B。输出应该与输入相似。

输入是两个矩阵 A 和 B 甲酸盐看起来像这样:

A,0,0,0.0
A,0,1,1.0
...
A,1,3,8.0
A,1,4,9.0
B,0,0,0.0
B,0,1,1.0
...
B,4,0,12.0
B,4,1,13.0

A,0,0,0.0表示索引为A(0,0)，值为0.0，B同理

这是我的 map 功能:

import sys
import string
import numpy
#Split line into array of entry data
entry = line.split(",")
# Set row, column, and value for this entry
row = int(entry[1])
col = int(entry[2])
value = float(entry[3])

#If this is an entry in matrix A...
if (entry[0] == "A"):

        #Generate the necessary key-value pairs
        for i in range(col):
                print('<{}{},{} {} {}}>'.format(row,i,A,col,value))
#Otherwise, if this is an entry in matrix B...
else:
        #Generate the necessary key-value pairs
        for i in range(row):
                print('<{}{},{} {} {}}>'.format(i,col,B,row,value))

我想知道如何编写 reduce 函数。这是我将使用的框架:

import sys
import string
import numpy

#number of columns of A/rows of B
n = int(sys.argv[1])

#Create data structures to hold the current row/column values (if needed; your code goes here)



currentkey = None

# input comes from STDIN (stream data that goes to the program)
for line in sys.stdin:

        #Remove leading and trailing whitespace
        line = line.strip()

        #Get key/value
        key, value = line.split('\t',1)

        #Parse key/value input (your code goes here)

    #If we are still on the same key...
    if key==currentkey:

            #Process key/value pair (your code goes here)


    #Otherwise, if this is a new key...
    else:
            #If this is a new key and not the first key we've seen
            if currentkey:

                    #compute/output result to STDOUT (your code goes here)

            currentkey = key

            #Process input for new key (your code goes here)

#Compute/output result for the last key (your code goes here)

为了运行这两个函数，我将使用以下代码使用一个小型测试数据集来测试它们:

cat smalltest.txt | python src/map.py 2 3 | sort -n | python src/reduce.py 5

Map给出的输出，然后用sort -n对key进行排序，所以我会用reducer来处理矩阵计算。我的困惑在于编写 reducer 函数。

最佳答案

不知道为什么要减少
我的 numpy 方法(使用一些 string/list/zip 技巧)

 strin = '''A,0,0,0.0
A,0,1,1.0
A,1,0,8.0
A,1,1,9.0
B,0,0,0.0
B,0,1,1.0
B,1,0,12.0
B,1,1,13.0'''.split()

lines = [*map(lambda x: x.split(","),strin)]

linesT = [*zip(*lines)]

linesT

[('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
 ('0', '0', '1', '1', '0', '0', '1', '1'),
 ('0', '1', '0', '1', '0', '1', '0', '1'),
 ('0.0', '1.0', '8.0', '9.0', '0.0', '1.0', '12.0', '13.0')]

现在我们可以得到 dims，数组 A，B 的数据

lastA = linesT[0].index("B") - 1

rowsA, colsA = int(linesT[1][lastA]) + 1, int(linesT[2][lastA]) + 1

datA = [*map(float, linesT[3][0:lastA + 1])]

A = np.array(datA).reshape((rowsA, colsA))

A
Out[50]: 
array([[ 0.,  1.],
       [ 8.,  9.]])

firstB = lastA + 1

rowsB, colsB = int(linesT[1][-1]) + 1, int(linesT[2][-1]) + 1

datB = [*map(float, linesT[3][firstB::])]

B = np.array(datB).reshape((rowsB, colsB))

B
Out[51]: 
array([[  0.,   1.],
       [ 12.,  13.]])

A @ B
Out[52]: 
array([[  12.,   13.],
       [ 108.,  125.]])

关于python - Map-Reduce 使用 Hadoop 解决 python 中的矩阵乘法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48649477/

25

4

0

文章推荐： c++ - 指针和数组的区别

文章推荐： C++ 函数对象返回 `p->first` 和 `p->second`

文章推荐： java - 在 DTconsole 中窗口 ID 达到 59 后重置为 0

文章推荐： hadoop - HBase 使用 SingleColumnValueFilter 过滤行

解决@Cacheable在同一个类中方法调用不起作用的问题
@Cacheable在同一类中方法调用无效上述图片中，同一个类中genLiveBullets()方法调用同类中的queryLiveByRoom()方法，这样即便标识了Cacheable标签，
解决@Transaction注解导致动态切换更改数据库失效问题
目录 @Transaction注解导致动态切换更改数据库失效使用场景遇到问题解决 @Transaction
解决@RequestBody使用不能class类型匹配的问题
@RequestBody不能class类型匹配在首次第一次尝试使用@RequestBody注解开始加载字符串使用post提交（貌似只能post），加Json数据格式传输的时候，
解决@Autowired注入static接口的问题
目录 @Autowired注入static接口问题 @Autowired自动注入普通service很方便但是如果注入static修饰的serv
解决@RequestBody部分属性丢失的问题
目录 @RequestBody部分属性丢失问题描述 JavaBean实现 Controller实现
解决@PathVariable参数接收不完整的问题
目录解决@PathVariable参数接收不完整的问题今天遇到的问题是：解决办法： @PathVariable接受的参
解决@Transactional注解事务不回滚不起作用的问题
这几天在项目里面发现我使用@Transactional注解事务之后，抛了异常居然不回滚。后来终于找到了原因。如果你也出现了这种情况，可以从下面开始排查。 1、特性先来了解一下@Trans
解决@PathVariable对于特殊字符截断的问题
概述： ? 1
解决@Around对静态方法不生效的问题
场景：在处理定时任务时，由于这几个方法都是静态方法，在aop的切面中使用@Around注解，进行监控方法调用是否有异常。发现aop没有生效。代码如下：
解决.net项目中上传的图片或者文件太大无法上传问题
最近做项目的时候用户提出要上传大图片一张图片有可能十几兆本来用的第三方的上传控件有限制图片上传大小的设置以前设置的是2M&nb
azure - 解决 SystemForCrossDomainIdentityManagementBadResponse
我已经实现了这个SCIM reference code在我们的应用程序中。我实现的代码确实通过了此postman link中存在的所有用户测试集合。。我的 SCIM Api 也被 Azure 接受
javascript - 等待 ".then"解决
我一直对“然后”不被等待的行为感到困扰，我明白其原因。然而，我仍然需要绕过它。这是我的用例。 doWork(family) { return doWork1(family)
javascript - 解决 promise
我正在尝试查找 channel 中的消息是否仍然存在，但是，我不确定如何解决 promise ，查看其他答案和文档，我可以看到它可能是通过函数实现的，但我是不完全确定如何去做。我希望能在这方面获得一些
java - 解决 IllegalAccessError
我有以下情况: 同一工作区中的 2 个 Eclipse 项目:Apa 和 Bepa(为简洁起见，使用化名)。 Apa 项目引用(包括)Bepa 项目。我在 Bepa 有一个类 X，具有公共(publ
java - 解决 NoClassDefFoundError
这个问题已经有答案了: Why am I getting a NoClassDefFoundError in Java? (31 个回答) 已关闭 6 年前。我正在努力学习 spring。所以我输入
java - 解决 ConcurrentModificationException
我正在写一个小游戏，屏幕上有许多圆圈在移动。我在两个线程中管理圈子，如下所示: public void run() { int stepCount = 0; int dx;
python - 解决()执行中止
我在使用 Sympy 求解方程时遇到问题。当我运行代码时，例如: 打印(校正(10)) 我希望它打印一个数字 f。相反，它给我错误:执行中止。 def correction(r): from
css - div布局问题。解决
好吧，我制作的每个页面都有这个问题。我不确定我做错了什么，但我所有的页面都不适用于所有分辨率。可能是因为我使用的是宽屏？大声笑我不确定，但在小于宽屏分辨率的情况下，它永远不会看起来正确。它的某些部分你
c# - 解决 ||检查字符串时的运算符
我正在尝试像这样进行一个非常简单的文化 srting 检查 if(culture.ToUpper() == "ES-ES" || "IT-IT") { //do something } else
linux - 解决.bashrc的最佳方法是什么？
Closed. This question is off-topic. It is not currently accepting answers. Learn more。想改进这个问题吗？Upda

首页

博学

6Ren·AI

商城

python - Map-Reduce 使用 Hadoop 解决 python 中的矩阵乘法