gpt4 book ai didi

hadoop - 如何将字符串转换为 Hive 中的结构数组?

转载 作者:行者123 更新时间:2023-12-02 20:47:33 25 4
gpt4 key购买 nike

我在 hive 中的表具有如下架构:

DESCRIBE struct_demo;
+-------------------+-------------------------------+
| name | type |
+-------------------+-------------------------------+
| lr_id | string |
| segment_info | ARRAY<struct< |
| | idlpSegmentName:string, |
| | idlpSegmentValue:string > |
| | > |
| | |
+-------------------+-------------------------------+

我在 Redshift(或任何 Sql 数据库)中创建表
它为 hive 中的上述数据类型创建了具有类似格式的行,
但作为字符串。

将数据从 redshift 插入 hive 时如何进行转换?
更具体地说,如何从字符串转换为结构数组?

我的 SQL 表:
lr_id    |          segment_info
---------|------------------------------------------------------------
1 | [{"idlpsegmentname":"axciom","idlpsegmentvalue":"200"},{"idlpsegmentname":"people","idlpsegmentvalue":"z"}]

到目前为止,无法找到任何符合要求的 udf。

最佳答案

无论如何,找到了解决方案。

package hive;


import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;


public class UAStructUDF extends GenericUDF {
private Object[] result;

@Override
public String getDisplayString(String[] arg0) {
return "My display string";
}

public static void main(String... args) {
UAStructUDF ua = new UAStructUDF();
ua.parseUAString("");
}

@Override
public ObjectInspector initialize(ObjectInspector[] arg0) throws UDFArgumentException {
// Define the field names for the struct<> and their types
ArrayList<String> structFieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>();
// fill struct field names
// segmentname
structFieldNames.add("idlpsegmentname");
structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
// segmentvalue
structFieldNames.add("idlpsegmentvalue");
structFieldObjectInspectors.add(PrimitiveObjectInspectorFactory.writableStringObjectInspector);
StructObjectInspector si = ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames,
structFieldObjectInspectors);
return ObjectInspectorFactory.getStandardListObjectInspector(si);
// return si;
}

@Override
public Object evaluate(DeferredObject[] args) throws HiveException {
if (args == null || args.length < 1) {
throw new HiveException("args is empty");
}
if (args[0].get() == null) {
throw new HiveException("args contains null instead of object");
}
Object argObj = args[0].get();
// get argument
String argument = null;
if (argObj instanceof Text) {
argument = ((Text) argObj).toString();
} else if (argObj instanceof String) {
argument = (String) argObj;
} else {
throw new HiveException(
"Argument is neither a Text nor String, it is a " + argObj.getClass().getCanonicalName());
}
// parse UA string and return struct, which is just an array of objects:
// Object[]
return parseUAString(argument);
}

private Object parseUAString(String argument) {
String test = "acxiom_cluster,03|aff_celeb_ent,Y";
List<Object[]> ret = new ArrayList<Object[]>();
for (String s : test.split("\\|")) {
String arr[] = s.split(",");
Object[] o = new Object[2];
o[0] = new Text(arr[0]);
o[1] = new Text(arr[1]);
ret.add(o);
}
return ret;
}
}

关于hadoop - 如何将字符串转换为 Hive 中的结构数组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47301494/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com