google-app-engine - 谷歌应用引擎 : Using Big Query on datastore?-6ren

google-app-engine - 谷歌应用引擎 : Using Big Query on datastore?

转载作者：太空宇宙更新时间：2023-11-03 15:18:23

25

4

拥有一个包含数十万个对象的 GAE 数据存储类型。想做几个涉及查询(涉及计数查询)。 Big Query 似乎非常适合做这件事。

目前是否有使用 Big Query 查询实时 AppEngine 数据存储的简单方法？

最佳答案

您不能直接在 DataStore 实体上运行 BigQuery，但您可以编写一个 Mapper 管道从 DataStore 中读取实体，将它们写入 Google Cloud Storage 中的 CSV，然后将这些提取到 BigQuery - 您甚至可以自动化过程。这是使用 Mapper API 的示例仅 DataStore 到 CSV 步骤的类:

import re
import time
from datetime import datetime
import urllib
import httplib2
import pickle

from google.appengine.ext import blobstore
from google.appengine.ext import db
from google.appengine.ext import webapp

from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.ext.webapp import blobstore_handlers
from google.appengine.ext.webapp import util
from google.appengine.ext.webapp import template

from mapreduce.lib import files
from google.appengine.api import taskqueue
from google.appengine.api import users

from mapreduce import base_handler
from mapreduce import mapreduce_pipeline
from mapreduce import operation as op

from apiclient.discovery import build
from google.appengine.api import memcache
from oauth2client.appengine import AppAssertionCredentials


#Number of shards to use in the Mapper pipeline
SHARDS = 20

# Name of the project's Google Cloud Storage Bucket
GS_BUCKET = 'your bucket'

# DataStore Model
class YourEntity(db.Expando):
  field1 = db.StringProperty() # etc, etc

ENTITY_KIND = 'main.YourEntity'


class MapReduceStart(webapp.RequestHandler):
  """Handler that provides link for user to start MapReduce pipeline.
  """
  def get(self):
    pipeline = IteratorPipeline(ENTITY_KIND)
    pipeline.start()
    path = pipeline.base_path + "/status?root=" + pipeline.pipeline_id
    logging.info('Redirecting to: %s' % path)
    self.redirect(path)


class IteratorPipeline(base_handler.PipelineBase):
  """ A pipeline that iterates through datastore
  """
  def run(self, entity_type):
    output = yield mapreduce_pipeline.MapperPipeline(
      "DataStore_to_Google_Storage_Pipeline",
      "main.datastore_map",
      "mapreduce.input_readers.DatastoreInputReader",
      output_writer_spec="mapreduce.output_writers.FileOutputWriter",
      params={
          "input_reader":{
              "entity_kind": entity_type,
              },
          "output_writer":{
              "filesystem": "gs",
              "gs_bucket_name": GS_BUCKET,
              "output_sharding":"none",
              }
          },
          shards=SHARDS)


def datastore_map(entity_type):
  props = GetPropsFor(entity_type)
  data = db.to_dict(entity_type)
  result = ','.join(['"%s"' % str(data.get(k)) for k in props])
  yield('%s\n' % result)


def GetPropsFor(entity_or_kind):
  if (isinstance(entity_or_kind, basestring)):
    kind = entity_or_kind
  else:
    kind = entity_or_kind.kind()
  cls = globals().get(kind)
  return cls.properties()


application = webapp.WSGIApplication(
                                     [('/start', MapReduceStart)],
                                     debug=True)

def main():
  run_wsgi_app(application)

if __name__ == "__main__":
  main()

如果您将其附加到 IteratorPipeline 类的末尾:yield CloudStorageToBigQuery(output)，您可以将生成的 csv 文件句柄通过管道传输到 BigQuery 摄取管道中...像这样:

class CloudStorageToBigQuery(base_handler.PipelineBase):
  """A Pipeline that kicks off a BigQuery ingestion job.
  """
  def run(self, output):

# BigQuery API Settings
SCOPE = 'https://www.googleapis.com/auth/bigquery'
PROJECT_ID = 'Some_ProjectXXXX'
DATASET_ID = 'Some_DATASET'

# Create a new API service for interacting with BigQuery
credentials = AppAssertionCredentials(scope=SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery_service = build("bigquery", "v2", http=http)

jobs = bigquery_service.jobs()
table_name = 'datastore_dump_%s' % datetime.utcnow().strftime(
    '%m%d%Y_%H%M%S')
files = [str(f.replace('/gs/', 'gs://')) for f in output]
result = jobs.insert(projectId=PROJECT_ID,
                    body=build_job_data(table_name,files)).execute()
logging.info(result)

def build_job_data(table_name, files):
  return {"projectId": PROJECT_ID,
          "configuration":{
              "load": {
                  "sourceUris": files,
                  "schema":{
                      # put your schema here
                      "fields": fields
                      },
                  "destinationTable":{
                      "projectId": PROJECT_ID,
                      "datasetId": DATASET_ID,
                      "tableId": table_name,
                      },
                  }
              }
          }

关于google-app-engine - 谷歌应用引擎 : Using Big Query on datastore?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10966841/

25

4

0

文章推荐： python - 如何获取该页面中每个广告的数据？

文章推荐： google-app-engine - 谷歌应用引擎 NDB : get an entity's id

c++ - matlab 引擎 "Can' t 启动 MATLAB 引擎”
以下代码: if (!(ep = engOpen("\0"))) { fprintf(stderr, "\nCan't start MATLAB engine\n");
客户的投票系统/引擎？
我在谈论一些网络事物，例如 http://uservoice.com/ 你能推荐任何其他类似的服务、网站，或者可能是(甚至更好)一个现成的引擎来部署在自己的服务器上？实际上，更多关于系统的问题，可以
Delphi轻量级数据库组件/引擎
很难说出这里问的是什么。这个问题是含糊的、模糊的、不完整的、过于宽泛的或修辞性的，无法以目前的形式得到合理的回答。如需帮助澄清此问题以便重新打开它，visit the help center 。已关
java - 矩阵表达式解析器/引擎
我正在寻找一个矩阵表达式解析器/引擎。例如， 3 * A + B * C 其中 A、B、C 是矩阵是一个典型的表达式。这应该类似于(单值)数学表达式解析器/引擎，但应该处理矩阵值和变量。我已经用谷歌搜
用于驾驶模拟的 3D 引擎
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。想改进这个问题？将问题更新为 on-topic对于堆栈溢出。 5年前关闭。 Improve this qu
.NET cometd 引擎
是否有基于 .net 的 cometd 引擎？比如 Ajax 推送引擎那是免费和开源的吗？最佳答案轨道式 Orbited是一个 HTTP 守护进程，针对长期 cometd 连接进行了优化。它旨在
c# - 棋盘表示 - 引擎
按照目前的情况，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引发辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the
java - 寻找Java嵌入式数据库框架/引擎
已结束。此问题正在寻求书籍、工具、软件库等的推荐。它不满足Stack Overflow guidelines 。目前不接受答案。我们不允许提出寻求书籍、工具、软件库等推荐的问题。您可以编辑问题，以便
Javascript HAML 引擎
我正在寻找支持以下功能的 haml javascript“端口”: 存储在文件中的模板。 JSON 输入。支持“集合”[{Booking},{Booking},{Booking}] 进行迭代处理。
ironpython - 如何将选项传递给新的 IronPython 引擎？
我在 IronPython 中托管 IronPython。我没有找到使用等效的命令行参数初始化它的方法:-X:FullFrames . 我的代码有点像这样: import clr clr.AddRef
email - 是否有支持通过电子邮件创建页面的 wiki 引擎？
我想将我工作的公司的所有松散信息整合到一个知识库中。 Wiki 似乎是一种可行的方法，但大部分相关信息都隐藏在 PST 文件中，并且需要很长时间才能说服人们将他们的电子邮件(包括附件)手动翻译成 Wi
android - 添加到应用程序中不存在请求的 flutter 引擎
我已经使用缓存的 flutter 引擎 flutter 到现有的 native 应用程序(添加到应用程序)中。 override fun onCreate(savedInstanceState: Bu
graphics - 是否需要编写新的 3D 引擎？
就目前而言，这个问题不适合我们的问答形式。我们希望答案得到事实、引用或专业知识的支持，但这个问题可能会引起辩论、争论、投票或扩展讨论。如果您觉得这个问题可以改进并可能重新打开，visit the he
Django Cassandra 引擎 - 如何定义表名
我正在使用 Django Cassandra我已经定义了我的模型，我可以用它来命名一个表: class Meta: db_table = "table_name" 但是，Cassand
olap - 是否存在非关系型 OLAP 引擎？
类似于 NoSQL 数据库，但适用于 OLAP。当然是开源的:) 编辑: OLAP 引擎在幕后使用关系数据库。例如 SAPBW 可以使用 Oracle 等。我的意思是一个没有这个底层关系数据库的 OL
.NET Razor 引擎 - 实现布局
我正在使用以下片段来 enable Razor templating in my solution (在 ASP.NET MVC3 之外)。是否可以轻松实现布局？背景资料: 我在这一点上(模板编译成
javascript - 替换页面加载时的默认 WYSIWYG 引擎
我们目前使用闭源知识库解决方案，所见即所得创建文章是TinyMCE(看起来可能是修改/简化的)。他们目前根本不允许更改它(添加插件等，除非您可以以某种方式注入(inject)插件)。我确实拥有对
performance - 高性能 BPEL 引擎？
我正在评估我们的高性能电信应用程序的 BPEL 引擎，但性能似乎很差。我们评估了 Apache Ode、SunBPEL 引擎、Active BPEL 等。您知道任何更快的 BPEL 引擎实现或 C/C
lucene - 无需保存数据的 Elasticsearch 引擎
Elastic / Lucene真的需要在文档中存储所有索引数据吗？您难道不就通过通过传递数据，以便Lucene may index the words into its hash table并为每个
libgdx - 如何设置游戏中的透视相机如图所示？ (引擎 libgdx)
我是 3D 游戏新手？我正在使用 Libgdx。如何计算像 Tetromino Revolution 游戏这样的透视相机的参数？请给我任何想法。看图片:http://www.terminalstud

首页

博学

6Ren·AI

商城

google-app-engine - 谷歌应用引擎 : Using Big Query on datastore?