java - Avro 模式演变 : Can't add or remove fields-6ren

java - Avro 模式演变 : Can't add or remove fields

转载作者：塔克拉玛干更新时间：2023-11-02 08:02:54

我目前正在尝试改进我的 avro 架构，根据文档，这应该没什么大不了的。但是，在添加或删除字段时，Avro 无法反序列化字节。

我正在使用以下架构:

AvroSchemas.avsc:

[
  {
    "namespace": "stackoverflow.example.avro",
    "type": "record",
    "name": "Record_1_1",
    "fields": [
      {"name": "value0", "type": "string"}
    ]
  },
  {
    "namespace": "stackoverflow.example.avro",
    "type": "record",
    "name": "Record_1_2",
    "fields": [
      {"name": "value0", "type": "string"},
      {"name": "value1", "type": "string", "default": "Hello World"}
    ]
  },
  {
    "namespace": "stackoverflow.example.avro",
    "type": "record",
    "name": "Record_2_1",
    "fields": [
      {"name": "someList", "type": {"type": "array", "items": "int"}}
    ]
  },
  {
    "namespace": "stackoverflow.example.avro",
    "type": "record",
    "name": "Record_2_2",
    "fields": [
      {"name": "someBool", "type": "boolean", "default": "false"},
      {"name": "someList", "type": {"type": "array", "items": "int"}}
    ]
  }
]

这些类是使用以下 Maven 构建插件生成的:

  <plugin>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro-maven-plugin</artifactId>
    <version>1.8.2</version>
    <executions>
      <execution>
        <phase>generate-sources</phase>
        <goals>
          <goal>schema</goal>
        </goals>
        <configuration>
          <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
          <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
          <stringType>String</stringType>
        </configuration>
      </execution>
    </executions>
  </plugin>

这是我用来测试进化的代码:

AvroTest.java:

package stackoverflow.example;

import java.io.ByteArrayOutputStream;
import java.util.ArrayList;
import java.util.Objects;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
import stackoverflow.example.avro.Record_1_1;
import stackoverflow.example.avro.Record_1_2;
import stackoverflow.example.avro.Record_2_1;
import stackoverflow.example.avro.Record_2_2;

public class AvroTest {

    public static void main(String[] args) throws Exception {
        executeTest0();
        executeTest1();
        executeTest2();
    }

    /**
     * Test if read and write methods work
     */
    private static void executeTest0() {
        Record_1_1 source1 = new Record_1_1("A");
        Record_1_1 dest1 = trySerializeDeserialize(source1, Record_1_1.class, Record_1_1.class);
        if (dest1 == null || !Objects.equals(source1.getValue0(), dest1.getValue0())) {
            throw new RuntimeException("Record_1_1 Test 0 failed");
        }

        Record_1_2 source2 = new Record_1_2("A", "B");
        Record_1_2 dest2 = trySerializeDeserialize(source2, Record_1_2.class, Record_1_2.class);
        if (dest2 == null || !Objects.equals(source2.getValue0(), dest2.getValue0()) || !Objects.equals(source2.getValue1(), dest2.getValue1())) {
            throw new RuntimeException("Record_1_2 Test 0 failed");
        }

        Record_2_1 source3 = new Record_2_1(new ArrayList<>());
        Record_2_1 dest3 = trySerializeDeserialize(source3, Record_2_1.class, Record_2_1.class);
        if (dest3 == null || !Objects.equals(source3.getSomeList(), dest3.getSomeList())) {
            throw new RuntimeException("Record_2_1 Test 0 failed");
        }

        Record_2_2 source4 = new Record_2_2(true, new ArrayList<>());
        Record_2_2 dest4 = trySerializeDeserialize(source4, Record_2_2.class, Record_2_2.class);
        if (dest4 == null || !Objects.equals(source4.getSomeBool(), dest4.getSomeBool()) || !Objects.equals(source4.getSomeList(), dest4.getSomeList())) {
            throw new RuntimeException("Record_2_2 Test 0 failed");
        }
    }

    private static void executeTest1() {
        Record_1_1 source1 = new Record_1_1("Test");
        Record_1_2 dest1 = trySerializeDeserialize(source1, Record_1_1.class, Record_1_2.class);
        if (dest1 == null || !Objects.equals(dest1.getValue1(), "Hello World")) {
            System.out.println("adding field with default value failed: " + dest1);
        }

        Record_1_2 source2 = new Record_1_2("Test0", "Test1");
        Record_1_1 dest2 = trySerializeDeserialize(source2, Record_1_2.class, Record_1_1.class);
        if (dest2 == null || !Objects.equals(source2.getValue0(), dest2.getValue0())) {
            System.out.println("removing field failed: " + dest2);
        }
    }

    private static void executeTest2() {
        Record_2_1 source1 = new Record_2_1(new ArrayList<>());
        Record_2_2 dest1 = trySerializeDeserialize(source1, Record_2_1.class, Record_2_2.class);
        if (dest1 == null || !Objects.equals(source1.getSomeList(), dest1.getSomeList())) {
            System.out.println("adding boolean field with default value failed: " + dest1);
        }

        Record_2_2 source2 = new Record_2_2(true, new ArrayList<>());
        Record_2_1 dest2 = trySerializeDeserialize(source2, Record_2_2.class, Record_2_1.class);
        if (dest2 == null || !Objects.equals(source2.getSomeList(), dest2.getSomeList())) {
            System.out.println("removing boolean field failed: " + dest2);
        }
    }

    private static <T, E> E trySerializeDeserialize(T source, Class<T> sourceClass, Class<E> destClass) {
        E result;

        try {
            byte[] bytes = write(source, sourceClass);
            result = read(bytes, destClass);
        } catch (Exception e) {
            result = null;
        }

        return result;
    }

    private static <T> byte[] write(T value, Class<T> clazz) throws Exception {
        byte[] bytes;

        try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
            Encoder encoder = EncoderFactory.get().binaryEncoder(bos, null);
            DatumWriter<T> writer = new SpecificDatumWriter<>(clazz);
            writer.write(value, encoder);

            encoder.flush();
            bytes = bos.toByteArray();
        }

        return bytes;
    }

    private static <T> T read(byte[] bytes, Class<T> clazz) throws Exception {
        Decoder decoder = DecoderFactory.get().binaryDecoder(bytes, null);
        DatumReader<T> reader = new SpecificDatumReader<>(clazz);

        return reader.read(null, decoder);
    }
}

输出:

adding field with default value failed: null
adding boolean field with default value failed: null
removing boolean field failed: null

根据文档，我的所有测试都应该有效(添加一个具有默认值的字段或在接收端删除一个字段)。但我不认为这些文档只是为了好玩而编写的，所以我可能缺少某些设置吗？

最佳答案

问题在于您尝试反序列化数据的方式。使用 SpecificDatumReader(Class<T>) 时构造函数，读者假定作者的模式和读者的模式是相同的。

您可以使用 SpecificDatumReader(Schema writer, Schema reader) 修复此问题反而。例如:

private static <T, E> E read(byte[] bytes, Class<T> sourceClass, Class<E> destClass) throws Exception {
    Decoder decoder = DecoderFactory.get().binaryDecoder(bytes, null);
    DatumReader<E> reader = new SpecificDatumReader<>(
            SpecificData.get().getSchema(sourceClass),
            SpecificData.get().getSchema(destClass));

    return reader.read(null, decoder);
}

注意 DatumWriter 的输出不是 Avro 文件，它始终包含用于序列化其 header 中数据的模式，而是一个没有 header 的序列化对象。如果你想测试 Avro 文件，你应该使用 DataFileWriter和 DataFileReader .

您的所有架构更改都是兼容的，并且应该根据 Avro format specification 工作.架构中唯一的错误是默认值 someBool - 它应该是 boolean 值 ( false ) 而不是字符串 ( "false" )。

关于java - Avro 模式演变 : Can't add or remove fields，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45917760/

文章推荐： java - @PostConstruct 的顺序和继承

文章推荐： java - 如何在 javapoet 中为 methodspec 添加修饰符？

git add * (asterisk) vs git add . (period)(Git Add*(星号)与Git Add。(句号))
I have a question about adding files in git. I have found multiple stackoverflow questions about
c# - Visual Studio : Add Item/Add as link rather than just Add
我是 visual studio 的新手，来自 Delphi。我有一个充满 .cs 文件的目录树(根是\Common)。我还有一个充满应用程序的目录树(根目录是\Applications) 最后，
java - 为什么 ArrayList add() 和 add(int index, E) 复杂度是摊销常数时间？为什么 add() 不是 O(1)，add(int index, E) 不是 O(n)？
这个问题在这里已经有了答案: Array's lookup time complexity vs. how it is stored (4 个答案) Time complexity for java
sql - MySQL: `... ADD INDEX(a); ... ADD INDEX(b);` 和 `... ADD INDEX(a,b);` 之间的区别？
谁能告诉我这两者有什么区别: ALTER TABLE x1 ADD INDEX(a); ALTER TABLE x1 ADD INDEX(b); 和 ALTER TABLE x1 ADD INDEX(
java - getChildren() add() 和 add()
为什么有时我们使用 getChildren() add() 而其他时候我们直接使用 add() es: https://docs.oracle.com/javafx/2/get_started/for
css - Bootstrap : add add-on below input
如何使用 bootstrap css 在输入下方添加跨度？我需要做这样的事情: 最佳答案是这样的吗？ http://jsfiddle.net/swm53ran/205/ 您可以使用纯 CSS 来实现
python - numpy - (np.add(X, Y, out=X); np.add(X, Y, out=X)) 在 np.add(X, 2*Y, out=X)
问题 np.add(X, 2*Y, out=X) 比 np.add(X, Y, out=X); np.add(X, Y, out=X).使用 np.add(X, Y, out=X); 是一种实际做法吗
git - 如何撤消 `` git add --intent-to-add``
当我跑 git add --intent-to-add .所有未跟踪的文件将其状态从“未跟踪的文件”( git status -s 显示 ?? )更改为“未暂存以进行提交的更改”( git statu
dart - add 和 sink.add 有什么区别？
我不知道 .add 之间有什么区别和 .sink.add ? 例子: StreamController myStreamController = StreamController(); stream
java - getContentPane().add() 和 add() 的意思一样吗
getContentPane().add() 和 add() 的意思一样吗？ public class TestFrame extends JFrame{ public TestFrame()
git - 对于初始提交， "add ."和 "add *"是同义词吗？
git add . 和 git add * 会完成完全相同的事情吗？最佳答案不，不会。 * 是一个 glob 模式，不会匹配以开头的文件。例如，假设这是当前目录，我有 2 个新文件要添加 fo
git add -A 和 git add . 的区别详解
git的分支与合并的两种方法 git add -A和 git add . git add -u在功能上看似很相近，但还是存在一点差别 git add . ：他会
git - "git add -A"和 "git add ."之间的区别
git add [--all | -A] 之间有什么区别？和 git add . ？最佳答案此答案仅适用于 Git 版本 1.x。对于 Git 版本 2.x，请参阅其他答案。总结: git ad
Wix 工具集 : cannot add Excel Add-in project reference
我刚刚安装了最新的 Wix v3.7。我创建了一个 VS 2010“Excel 2010 加载项”项目，并在同一个解决方案中创建了一个 Wix“安装项目”。问题是，当我尝试从 Wix 项目中引用 A
javascript - YUI.add 和 YUI().add 的区别
YUI.add 和 YUI().add 有什么区别？最佳答案在第一种情况下，您要注册一个模块可以加载到 YUI 沙箱中，在第二种情况下，您要构建一个沙箱，然后进行注册(这是一种非常不典型的用法)。
javascript - "How do I add items to a list and then add to the empty space below?"
测试代码时，任何输入到列表中的值在按下“enter”后都会消失。我对编程和网络开发非常陌生。请具体一点，以便我理解。 function addItem(){ var item = documen
python - BINARY ADD 和 INPLACE ADD 的区别
我正在浏览 python 的 dis 包。我尝试了代码以查看它是如何工作的 >>> def get(): ... x=4 ... y=x+3 ............ this lin
git - 如果我在开始时执行 "add"，我应该 "add ."git 中的每个新文件吗？
我已经对我的文件夹进行了版本控制 git init git add . git commit -m 'Initial commit' 我应该怎么做 git add 对于我在 .? 中创建的每个新文件
git - $ git add --all 与 $ git add * 之间的区别？
当我执行 $ git add * 时，有时我意识到 git 不会将已删除的文件添加到舞台上，如果删除或添加它，我需要手动指示，但我想不通找出 $ git add --all 有什么区别。因此，如果星号
git - 什么时候使用 "git add ."什么时候使用 "git add -A"
这个问题在这里已经有了答案: Difference between "git add -A" and "git add ." (12 个答案) 关闭 6 年前。目前，当我想提交并将内容推送到远程

塔克拉玛干

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - Avro 模式演变 : Can't add or remove fields