gpt4 book ai didi

java - 从 Google Cloud Dataflow 内部写入 Firestore

转载 作者:行者123 更新时间:2023-12-01 16:39:51 25 4
gpt4 key购买 nike

我现在遇到的核心问题是,当我运行部署到 Google Cloud Dataflow 的 Dataflow 管道时,出现错误:

java.lang.IllegalStateException: FirebaseApp with name [DEFAULT] doesn't exist.

如果我在本地运行相同的管道,则一切正常。因此我怀疑是身份验证问题或环境问题。

代码位:

DEPLOY 和 REAL 变量用于控制是否推送到云(或在本地运行)以及是否使用我的 Pub/Sub 源或使用 moc'd 数据。在 moc'd 和 pub/sub 数据之间切换似乎根本不会对 Firestore 情况产生影响。只有部署与否才起作用。

我初始化 Firestore 应用程序的 main() 部分:

    public class BreakingDataTransactions {

// When true, this pulls from the specified Pub/Sub topic
static Boolean REAL = true;
// when set to true the job gets deployed to Cloud Dataflow
static Boolean DEPLOY = true;

public static void main(String[] args) {
// validate our env vars
if (GlobalVars.projectId == null ||
GlobalVars.pubsubTopic == null ||
GlobalVars.gcsBucket == null ||
GlobalVars.region == null) {
System.out.println("You have to set environment variables for project (BREAKING_PROJECT), pubsub topic (BREAKING_PUBSUB), region (BREAKING_REGION) and Cloud Storage bucket for staging (BREAKING_DATAFLOW_BUCKET) in order to deploy this pipeline.");
System.exit(1);
}

// Initialize our Firestore instance
try {
GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
System.out.println("*************************");
System.out.println(credentials);
FirebaseOptions firebaseOptions =
new FirebaseOptions.Builder()
.setCredentials(credentials)
.setProjectId(GlobalVars.projectId)
.build();
FirebaseApp firebaseApp = FirebaseApp.initializeApp(firebaseOptions);

} catch (IOException e) {
e.printStackTrace();
}

// Start dataflow pipeline
DataflowPipelineOptions options =
PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);

options.setProject(GlobalVars.projectId);

if (DEPLOY) {
options.setRunner(DataflowRunner.class);
options.setTempLocation(GlobalVars.gcsBucket);
options.setRegion(GlobalVars.region);
}

Pipeline p = Pipeline.create(options);

以及我正在处理的部分:

    PCollection<Data> dataCollection =
jsonStrings
.apply(ParDo.of(JSONToPOJO.create(Data.class)))
.setCoder(AvroCoder.of(Data.class));

PCollection<Result> result =
dataCollection
.apply(Window.into(FixedWindows.of(Duration.standardSeconds(1))))
.apply(WithKeys.of(x -> x.operation + "-" + x.job_id))
.setCoder(KvCoder.of(StringUtf8Coder.of(), AvroCoder.of(Data.class)))
.apply(Combine.<String, Data, Result>perKey(new DataAnalysis()))
.apply(Reify.windowsInValue())
.apply(MapElements.into(TypeDescriptor.of(Result.class))
.<KV<String, ValueInSingleWindow<Result>>>via(
x -> {
Result r = new Result();
String key = x.getKey();
r.query_action = key.substring(0, key.indexOf("-"));
r.job_id = key.substring(key.indexOf("-") + 1);
r.average_latency = x.getValue().getValue().average_latency;
r.failure_percent = x.getValue().getValue().failure_percent;
r.timestamp = x.getValue().getTimestamp().getMillis();
return r;
}));

// this node will (hopefully) actually write out to Firestore
result.apply(ParDo.of(new FireStoreOutput()));

最后是 FireStoreOutput 类:

  public static class FireStoreOutput extends DoFn<Result, String> {

Firestore db;

@ProcessElement
public void processElement(@Element Result result) {

db = FirestoreClient.getFirestore();
DocumentReference docRef = db.collection("events")
.document("next2020")
.collection("transactions")
.document(result.job_id)
.collection("transactions")
.document();
//System.out.println(docRef.getId());
// Add document data with id "alovelace" using a hashmap
Map<String, Object> data = new HashMap<>();
data.put("failure_percent", result.failure_percent);
data.put("average_latency", result.average_latency);
data.put("query_action", result.query_action);
data.put("timestamp", result.timestamp);

// asynchronously write data
ApiFuture<WriteResult> writeResult = docRef.set(data);
try {
writeResult.get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
;
}
}

错误发生在以下行:db = FirestoreClient.getFirestore();

我正在使用 --serviceAccount 标志部署 Dataflow 作业,该标志指定有权执行所有操作的服务帐户。

因此,除非 GoogleCredentialscredentials = GoogleCredentials.getApplicationDefault(); 不知何故不起作用(但您在那里看到打印语句,并且它确实在构建时正确打印出凭据),否则' t它。

但是,这只发生在构建时...所以我想知道我是否存在持久性问题,它在构建时初始化得很好,但是当作业实际在云中运行时,它会丢失之间的初始化部署和处理。如果是这样,我该如何解决这个问题?

谢谢!

最佳答案

好吧,我找到了解决方案...最大的问题是我的 DAG 的 PCollection 被分成了两个线程路径。我有两种类型的操作“读取”和“写入”,因此这些结果各自将 PCollection 发送到我的 FirestoreOut 类,这是我尝试初始化 Firestore 应用程序的地方,导致已经初始化的问题。

但是,使我的 db 对象成为同步静态对象,并建立一个同步 getDB() 方法,仅在尚未设置时才对其进行初始化。 FireStoreOut 部分的最终更新相关代码:

  public static class FireStoreOutput extends DoFn<Result, String> {

static Firestore db;

public static synchronized Firestore getDB() {
if (db == null) {
System.out.println("I'm being called");
// Initialize our Firestore instance
try {
GoogleCredentials credentials = GoogleCredentials.getApplicationDefault();
System.out.println("*************************");
System.out.println(credentials);
FirebaseOptions firebaseOptions =
new FirebaseOptions.Builder()
.setCredentials(credentials)
.setProjectId(GlobalVars.projectId)
.build();
FirebaseApp firebaseApp = FirebaseApp.initializeApp(firebaseOptions);

} catch (IOException e) {
e.printStackTrace();
}
db = FirestoreClient.getFirestore();
}
return db;
}

@ProcessElement
public void processElement(@Element Result result) {
DocumentReference docRef = getDB().collection("events")
.document("next2020")
.collection("transactions")
.document(result.job_id)
.collection("transactions")
.document();

关于java - 从 Google Cloud Dataflow 内部写入 Firestore,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61881082/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com