- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在使用数据流 0.5.5 Python。在非常简单的代码中遇到以下错误:
print(len(row_list))
row_list
是一个列表。完全相同的代码、相同的数据和相同的管道在 DirectRunner 上运行得非常好,但在 DataflowRunner 上抛出了以下异常。这是什么意思,我该如何解决?
job name: `beamapp-root-0216042234-124125`
(f14756f20f567f62): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 544, in do_work
work_executor.execute()
File "dataflow_worker/executor.py", line 973, in dataflow_worker.executor.MapTaskExecutor.execute (dataflow_worker/executor.c:30547)
with op.scoped_metrics_container:
File "dataflow_worker/executor.py", line 974, in dataflow_worker.executor.MapTaskExecutor.execute (dataflow_worker/executor.c:30495)
op.start()
File "dataflow_worker/executor.py", line 302, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:12149)
def start(self):
File "dataflow_worker/executor.py", line 303, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:12053)
with self.scoped_start_state:
File "dataflow_worker/executor.py", line 316, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:11968)
with self.shuffle_source.reader() as reader:
File "dataflow_worker/executor.py", line 320, in dataflow_worker.executor.GroupedShuffleReadOperation.start (dataflow_worker/executor.c:11912)
self.output(windowed_value)
File "dataflow_worker/executor.py", line 152, in dataflow_worker.executor.Operation.output (dataflow_worker/executor.c:6317)
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "dataflow_worker/executor.py", line 85, in dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:4021)
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/executor.py", line 766, in dataflow_worker.executor.BatchGroupAlsoByWindowsOperation.process (dataflow_worker/executor.c:25558)
self.output(wvalue.with_value((k, wvalue.value)))
File "dataflow_worker/executor.py", line 152, in dataflow_worker.executor.Operation.output (dataflow_worker/executor.c:6317)
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "dataflow_worker/executor.py", line 85, in dataflow_worker.executor.ConsumerSet.receive (dataflow_worker/executor.c:4021)
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/executor.py", line 545, in dataflow_worker.executor.DoOperation.process (dataflow_worker/executor.c:18474)
with self.scoped_process_state:
File "dataflow_worker/executor.py", line 546, in dataflow_worker.executor.DoOperation.process (dataflow_worker/executor.c:18428)
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 195, in apache_beam.runners.common.DoFnRunner.receive (apache_beam/runners/common.c:5137)
self.process(windowed_value)
File "apache_beam/runners/common.py", line 262, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:7078)
self.reraise_augmented(exn)
File "apache_beam/runners/common.py", line 274, in apache_beam.runners.common.DoFnRunner.reraise_augmented (apache_beam/runners/common.c:7467)
raise type(exn), args, sys.exc_info()[2]
File "apache_beam/runners/common.py", line 258, in apache_beam.runners.common.DoFnRunner.process (apache_beam/runners/common.c:6967)
self._dofn_simple_invoker(element)
File "apache_beam/runners/common.py", line 198, in apache_beam.runners.common.DoFnRunner._dofn_simple_invoker (apache_beam/runners/common.c:5283)
self._process_outputs(element, self.dofn_process(element.value))
File "apache_beam/runners/common.py", line 286, in apache_beam.runners.common.DoFnRunner._process_outputs (apache_beam/runners/common.c:7678)
for result in results:
File "trip_augmentation_test.py", line 120, in get_osm_way
TypeError: object of type '_UnwindowedValues' has no len() [while running 'Pull way info from mapserver']
#!/usr/bin/env python
# coding: utf-8
from __future__ import absolute_import
import argparse
import logging
import json
import apache_beam as beam
from apache_beam.utils.options import PipelineOptions
from apache_beam.utils.options import SetupOptions
def get_osm_way(pairs_same_group):
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from multiprocessing.pool import ThreadPool
import time
#disable InsecureRequestWarning for a cleaner output
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
print('processing hardwareid={} trips'.format(pairs_same_group[0]))
row_list = pairs_same_group[1]
print(row_list)
http_request_num = len(row_list) ######### this line ran into the above error##########
with requests.Session() as s:
s.mount('https://ip address',HTTPAdapter(pool_maxsize=http_request_num)) ##### a host name is needed for this http persistent connection
pool = ThreadPool(processes=1)
for row in row_list:
hardwareid=row['HardwareId']
tripid=row['TripId']
latlonArr = row['LatLonStrArr'].split(',');
print('gps points num: {}'.format(len(latlonArr)))
cor_array = []
for latlon in latlonArr:
lat = latlon.split(';')[0]
lon = latlon.split(';')[1]
cor_array.append('{{"x":"{}","y":"{}"}}'.format(lon, lat))
url = 'https://<ip address>/functionname?coordinates=[{}]'.format(','.join(cor_array))
print(url)
print("Requesting")
r = pool.apply_async(thread_get, (s, url)).get()
print ("Got response")
print(r)
if r.status_code==200:
yield (hardwareid,tripid,r.text)
else:
yield (hardwareid,tripid,None)
def run(argv=None):
parser = argparse.ArgumentParser()
parser.add_argument('--input',
help=('Input BigQuery table to process specified as: '
'PROJECT:DATASET.TABLE or DATASET.TABLE.'))
parser.add_argument(
'--output',
required=True,
help=
('Output BigQuery table for results specified as: PROJECT:DATASET.TABLE '
'or DATASET.TABLE.'))
known_args, pipeline_args = parser.parse_known_args(argv)
pipeline_options = PipelineOptions(argv)
pipeline_options.view_as(SetupOptions).save_main_session = True
p = beam.Pipeline(options=pipeline_options)
(p
| 'Read trip from BigQuery' >> beam.io.Read(beam.io.BigQuerySource(query=known_args.input))
| 'Convert' >> beam.Map(lambda row: (row['HardwareId'],row))
| 'Group devices' >> beam.GroupByKey()
| 'Pull way info from mapserver' >> beam.FlatMap(get_osm_way)
| 'Map way info to dictionary' >> beam.FlatMap(convert_to_dict)
| 'Save to BQ' >> beam.io.Write(beam.io.BigQuerySink(
known_args.output, schema='HardwareId:INTEGER,TripId:INTEGER,OrderBy:INTEGER,IndexRatio:FLOAT,IsEstimate:BOOLEAN,IsOverRide:BOOLEAN,MaxSpeed:FLOAT,Provider:STRING,RoadName:STRING,WayId:STRING,LastEdited:TIMESTAMP,WayLatLons:STRING,BigDataComment:STRING',
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))
)
# Run the pipeline (all operations are deferred until run() is called).
p.run()
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()
!python trip_augmentation_test.py \
--output 'my-project:my-dataset.mytable' \
--input 'SELECT HardwareId,TripId, LatLonStrArr FROM [my-project:my-dataset.mytable] ' \
--project 'my-project' \
--runner 'DataflowRunner' \ ### if just change this to DirectRunner, everything's fine
--temp_location 'gs://mybucket/tripway_temp' \
--staging_location 'gs://mybucket/tripway_staging' \
--worker_machine_type 'n1-standard-2' \
--profile_cpu True \
--profile_memory True
row_list
的类型,原来,在 DataflowRunner 中,它是
<class 'apache_beam.transforms.trigger._UnwindowedValues'>
,而在 DirectRunner 中,它是
list
.这是预期的不一致吗?
最佳答案
这种抽象在像 Beam/Dataflow(和其他)这样的大数据系统中是必要的。考虑到列表中的元素数量可能是任意大的。_UnwindowedValues
提供了可迭代的接口(interface)来访问这组可以是任何大小的元素,并且可能无法将整个元素保存在内存中。
Direct Runner 返回一个列表的事实是一个不一致的问题,在 Beam 的几个版本中已修复。在 Dataflow 中,GroupByKey
的结果不以列表形式出现,不支持len
- 但它是 可迭代的。
总之,在做http_request_num = len(row_list)
之前,您可以将其强制转换为支持 len 的类型,例如:
row_list = list(pairs_same_group[1])
http_request_num = len(row_list)
关于google-cloud-dataflow - '_UnwindowedValues' 类型的对象没有 len() 是什么意思?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42276520/
我的一位教授给了我们一些考试练习题,其中一个问题类似于下面(伪代码): a.setColor(blue); b.setColor(red); a = b; b.setColor(purple); b
我似乎经常使用这个测试 if( object && object !== "null" && object !== "undefined" ){ doSomething(); } 在对象上,我
C# Object/object 是值类型还是引用类型? 我检查过它们可以保留引用,但是这个引用不能用于更改对象。 using System; class MyClass { public s
我在通过 AJAX 发送 json 时遇到问题。 var data = [{"name": "Will", "surname": "Smith", "age": "40"},{"name": "Wil
当我尝试访问我的 View 中的对象 {{result}} 时(我从 Express js 服务器发送该对象),它只显示 [object][object]有谁知道如何获取 JSON 格式的值吗? 这是
我有不同类型的数据(可能是字符串、整数......)。这是一个简单的例子: public static void main(String[] args) { before("one"); }
嗨,我是 json 和 javascript 的新手。 我在这个网站找到了使用json数据作为表格的方法。 我很好奇为什么当我尝试使用 json 数据作为表时,我得到 [Object,Object]
已关闭。此问题需要 debugging details 。目前不接受答案。 编辑问题以包含 desired behavior, a specific problem or error, and the
我听别人说 null == object 比 object == null check 例如: void m1(Object obj ) { if(null == obj) // Is thi
Match 对象 提供了对正则表达式匹配的只读属性的访问。 说明 Match 对象只能通过 RegExp 对象的 Execute 方法来创建,该方法实际上返回了 Match 对象的集合。所有的
Class 对象 使用 Class 语句创建的对象。提供了对类的各种事件的访问。 说明 不允许显式地将一个变量声明为 Class 类型。在 VBScript 的上下文中,“类对象”一词指的是用
Folder 对象 提供对文件夹所有属性的访问。 说明 以下代码举例说明如何获得 Folder 对象并查看它的属性: Function ShowDateCreated(f
File 对象 提供对文件的所有属性的访问。 说明 以下代码举例说明如何获得一个 File 对象并查看它的属性: Function ShowDateCreated(fil
Drive 对象 提供对磁盘驱动器或网络共享的属性的访问。 说明 以下代码举例说明如何使用 Drive 对象访问驱动器的属性: Function ShowFreeSpac
FileSystemObject 对象 提供对计算机文件系统的访问。 说明 以下代码举例说明如何使用 FileSystemObject 对象返回一个 TextStream 对象,此对象可以被读
我是 javascript OOP 的新手,我认为这是一个相对基本的问题,但我无法通过搜索网络找到任何帮助。我是否遗漏了什么,或者我只是以错误的方式解决了这个问题? 这是我的示例代码: functio
我可以很容易地创造出很多不同的对象。例如像这样: var myObject = { myFunction: function () { return ""; } };
function Person(fname, lname) { this.fname = fname, this.lname = lname, this.getName = function()
任何人都可以向我解释为什么下面的代码给出 (object, Object) 吗? (console.log(dope) 给出了它应该的内容,但在 JSON.stringify 和 JSON.parse
我正在尝试完成散点图 exercise来自免费代码营。然而,我现在只自己学习了 d3 几个小时,在遵循 lynda.com 的教程后,我一直在尝试确定如何在工具提示中显示特定数据。 This code
我是一名优秀的程序员,十分优秀!