gpt4 book ai didi

hadoop - 在 Hadoop 中序列化一个长字符串

转载 作者:行者123 更新时间:2023-12-02 21:51:17 25 4
gpt4 key购买 nike

我有一个在 Hadoop 中实现 WritableComparable 类的类。这个类有两个字符串变量,一个很短,一个很长。我用 writeChars写这些变量和readLine阅读它们,但似乎我遇到了某种错误。在 Hadoop 中序列化这么长的字符串的最佳方法是什么?

最佳答案

我认为您可以使用 byteswritable 来提高效率。检查以下具有 BytesWritable 类型作为 callId 的自定义键。

public class CustomMRKey implements WritableComparable<CustomMRKey> {
private BytesWritable callId;
private IntWritable mapperType;

/**
* @default constructor
*/
public CustomMRKey() {
set(new BytesWritable(), new IntWritable());
}

/**
* Constructor
*
* @param callId
* @param mapperType
*/
public CustomMRKey(BytesWritable callId, IntWritable mapperType) {
set(callId, mapperType);
}

/**
* sets the call id and mapper type
*
* @param callId
* @param mapperType
*/
public void set(BytesWritable callId, IntWritable mapperType) {
this.callId = callId;
this.mapperType = mapperType;
}

/**
* This method returns the callId
*
* @return callId
*/
public BytesWritable getCallId() {
return callId;
}

/**
* This method sets the callId given a callId
*
* @param callId
*/
public void setCallId(BytesWritable callId) {
this.callId = callId;
}

/**
* This method returns the mapper type
*
*
* @return
*/
public IntWritable getMapperType() {
return mapperType;
}

/**
* This method is set to store the mapper type
*
* @param mapperType
*/
public void setMapperType(IntWritable mapperType) {
this.mapperType = mapperType;
}

@Override
public void readFields(DataInput in) throws IOException {
callId.readFields(in);
mapperType.readFields(in);

}

@Override
public void write(DataOutput out) throws IOException {
callId.write(out);
mapperType.write(out);
}

@Override
public boolean equals(Object obj) {
if (obj instanceof CustomMRCdrKey) {
CustomMRCdrKey key = (CustomMRCdrKey) obj;
return callId.equals(key.callId)
&& mapperType.equals(key.mapperType);
}
return false;
}

@Override
public int compareTo(CustomMRCdrKey key) {
int cmp = callId.compareTo(key.getCallId());
if (cmp != 0) {
return cmp;
}
return mapperType.compareTo(key.getMapperType());
}

}

要在说映射器代码中使用说,您可以使用以下内容生成 BytesWritable 表单的 key :-

您可以调用为:

CustomMRKey customKey=new CustomMRKey(new BytesWritable(),new IntWritable());
customKey.setCallId(makeKey(value, this.resultKey));
customKey.setMapperType(this.mapTypeIndicator);

然后 makeKey 方法如下所示:-
public BytesWritable makeKey(Text value, BytesWritable key) throws IOException {
try {
ByteArrayOutputStream byteKey = new ByteArrayOutputStream(Constants.MR_DEFAULT_KEY_SIZE);
for (String field : keyFields) {
byte[] bytes = value.getString(field).getBytes();
byteKey.write(bytes,0,bytes.length);
}
if(key==null){
return new BytesWritable(byteKey.toByteArray());
}else{
key.set(byteKey.toByteArray(), 0, byteKey.size());
return key;
}
} catch (Exception ex) {
throw new IOException("Could not generate key", ex);
}
}

希望这可能会有所帮助。

关于hadoop - 在 Hadoop 中序列化一个长字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20670404/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com