python - TypeError : tuple indices must be integers, 不是 str 使用 pyspark 和 RDD-6ren

python - TypeError : tuple indices must be integers, 不是 str 使用 pyspark 和 RDD

转载作者：太空宇宙更新时间：2023-11-04 04:42:11

24

4

我是 Python 新手。我也是 pysaprk 的新手。我正在尝试运行采用 (kv[0], kv[1]) 的代码行，然后在 kv[1] 上运行 ngrams() 函数。

这里还有代码处理的 mentions 数据的示例布局:

Out[12]: 
[{'_id': u'en.wikipedia.org/wiki/Kamchatka_Peninsula',
  'source': 'en.wikipedia.org/wiki/Warthead_sculpin',
  'span': (100, 119),
  'text': u' It is native to the northern.'},
 {'_id': u'en.wikipedia.org/wiki/Warthead_sculpin',
  'source': 'en.wikipedia.org/wiki/Warthead_sculpin',
  'span': (4, 20),
  'text': u'The warthead sculpin ("Myoxocephalus niger").'}]

这是我正在使用的代码:

    def build(self, mentions, idfs):
            m = mentions\
                .map(lambda (source, target, span, text): (target, text))
                .flatMapValues(lambda v: ngrams(v, self.max_ngram))
                .map(lambda v: (v, 1))
                .reduceByKey(add)\

应该如何制定上一步的数据来解决这个错误？任何帮助或指导将不胜感激。

我正在使用 python 2.7 和 pyspark 2.3.0。

谢谢，

最佳答案

mapValues 只能应用于 (key, value) 对的 RDD(RDD，其中每个元素都是一个 元组length 的 等于 2，或者某些行为与 1 相同的对象 - How to determine if object is a valid key-value pair in PySpark )

您的数据是字典，因此不符合条件。不清楚你在那里期望什么，但你怀疑你想要:

from operator import itemgetter

(mentions
  .map(itemgetter("_id", "text"))
  .flatMapValues(lambda v: ngrams(v, self.max_ngram))
  .map(lambda v: (v, 1)))

关于python - TypeError : tuple indices must be integers, 不是 str 使用 pyspark 和 RDD，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50414294/

24

4

0

文章推荐： linux - 保持 PC 资源始终在所有主机上运行

文章推荐： python - Pandas Dataframe 一次追加一行到 CSV

python : Find tuples from a list of tuples having duplicate data in the 0th element(of the tuple)
我有一个包含文件名和文件路径的元组列表。我想找到重复的 filename(但 filepath 可能不同)，即文件名相同但 filepath 可能不同的元组。元组列表示例: file_info
c++ - std::tuple 和 std::tuple 是否被 std::vector 视为同一类型？
我有一个像这样定义的变量 auto drum = std::make_tuple ( std::make_tuple ( 0.3f , Ex
swift 4 : pattern match an object against a tuple (Tuple pattern cannot match values of the non-tuple type)
我有一个包含几个字段的自定义结构，我想在快速 switch 语句中对其进行模式匹配，这样我就可以通过将其中一个字段与另一个字段进行比较来自定义匹配正则表达式。例如鉴于这种结构: struct MyS
c++ - 过滤嵌套动态元组(dynamic tuple of tuples)
我有一种动态元组结构: template //Should only be tuples class DynamicTuple { vector data; //All data is st
c# Tuple - 什么是 Tuple 的实际用途
这个问题在这里已经有了答案: What and When to use Tuple? [duplicate] (5 个答案) 关闭 8 年前。我正在查看 Tuple 的在线示例，但我没有看到任何理
tuples - common-lisp 中有 'tuple' 等价物吗？
在我的项目中我有很多坐标要处理，在二维情况下我发现(cons x y)的构造比(list x y)快和 (vector x y)。但是，我不知道如何将 cons 扩展到 3D 或更进一步，因为我没有
Scala Function.tupled 与 f.tupled
我有以下 Scala 代码: def f(x: Int, y: Int): Option[String] = x*y match { case 0 => None case n =>
scala - N-Tuple of Options to Option of N-Tuple
我的直觉告诉我，在一般情况下，只有宏或复杂类型的体操才能解决这个问题。 Shapeless 或 Scalaz 可以在这里帮助我吗？这是 N=2 问题的具体实例，但我正在寻找的解决方案适用于所有合理的
scala - 为什么 Scala 在解包 Tuple 时要构造一个新的 Tuple？
为什么这段 Scala 代码是这样的: class Test { def foo: (Int, String) = { (123, "123") } def bar: Unit
python - 类型错误 : can only concatenate tuple (not "vector") to tuple
我是 python 和 pygame 的新手，我正在尝试学习向量和类的基础知识，但在这个过程中我搞砸了，而且我在理解和修复标题中的错误消息方面苦苦挣扎。这是我的 Vector 类的代码: impor
python - "TypeError: can only concatenate tuple (not " float ") to tuple"
我正在编写一个程序来打开和读取一个 txt 文件，并在每一行中循环。将第 2 列和第 4 列中的值相乘并将其分配给第 5 列。 A 500.00 A 84.15 ? B 648.80 B 77.61
Python 类型错误 : can only concatenate tuple (not "str") to tuple
我知道还有其他几个问题提出了完全相同的问题，但是当我运行时: 导入命令从 pyDes 导入 * def encrypt(data, password,): k = des(password,
python 3 : Removing an empty tuple from a list of tuples
我有一个元组列表，内容如下: >>>myList [(), (), ('',), ('c', 'e'), ('ca', 'ea'), ('d',), ('do',), ('dog', 'ear', '
c++ - std::tuple 和 boost::tuple 之间的转换
给定一个 boost::tuple 和 std::tuple，你如何在它们之间进行转换？也就是说，您将如何实现以下两个功能？ template boost::tuple asBoostTuple(
c++ - 为什么不能用兼容类型的 std::tuple 按元素构造 std::tuple？
我无法初始化 std::tuple来自 std::tuple 的逐元素元素兼容类型。为什么它不像 boost::tuple 那样工作？ #include #include template st
java - 创建一个 backtype.storm.tuple.Tuple 用于测试目的？
我是 Storm 的新手并且我正在尝试找出如何编写一个 bolt 测试来测试子类 BaseRichBolt 中的 execute(Tuple tuple) 方法。问题是 Tuple 似乎是不可变的，
Python:从不考虑顺序的 "set of tuples"生成 "list of tuples"
如果我有如下元组列表: [('a', 'b'), ('c', 'd'), ('a', 'b'), ('b', 'a')] 我想删除重复的元组(在内容和内部项目顺序方面重复)以便输出为: [('a',
python - 类型错误 : can only concatenate tuple (not "list") to tuple"
我编写了一个简单的脚本来模拟基于每用户平均收入 (ARPU)、利润率和客户保持客户的年数 (ltvYears) 的客户生命周期值(value) (LTV)。下面是我的脚本。它在“ltvYears =
Python: Append tuple to a set with tuples(Python：将元组附加到具有元组的集合)
以下是我的代码，它是一组元组：。输出：设置([(‘A’，20160129，36.44)，(‘A’，20160104，41.06)，(‘A’，20160201，37.37)])。如何将另一个元组(‘A’
python - 类型错误 : Type Tuple cannot be instantiated; use tuple() instead
我用以下代码编写了一个程序: import pandas as pd import numpy as np from typing import Tuple def split_data(self,

首页

博学

6Ren·AI

商城

python - TypeError : tuple indices must be integers, 不是 str 使用 pyspark 和 RDD