2010-12-08 105 views
3

如何为定制Hadoop类型定义ArrayWritable?我想实现在Hadoop的倒排索引,使用自定义Hadoop的类型来存储数据为定制Hadoop类型实现ArrayWritable

我有一个个人发布类存储词频,文档ID和字节偏移的名单,任期文档中。

我有一个发布类具有文档频率(的术语出现在文件数)和列表个别记帐的

我已经定义扩展ArrayWritable类的字节偏移的列表中的一个LongArrayWritable IndividualPostings

当我定义的自定义ArrayWritable为IndividualPosting我遇到后本地部署(使用Karmasphere,Eclipse的)一些问题。

所有IndividualPosting在发帖类列表实例是相同的,即使我得到不同的值在减少方法

+0

你能解释一下究竟是什么问题吗?也许发布你的自定义ArrayWritable的一些代码? – bajafresh4life 2010-12-08 15:04:19

回答

8

ArrayWritable文档:

可写的包含一个类的实例的数组。这个可写的元素必须都是同一类的实例。如果这个可写入将是Reducer的输入,则需要创建一个将该值设置为正确类型的子类。例如:public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

你已经引与Hadoop的定义的WritableComparable类型这样做。这是我认为您的实现看起来像LongWritable

public static class LongArrayWritable extends ArrayWritable 
{ 
    public LongArrayWritable() { 
     super(LongWritable.class); 
    } 
    public LongArrayWritable(LongWritable[] values) { 
     super(LongWritable.class, values); 
    } 
} 

你应该能够,实现WritableComparable通过the documentation给出任何类型的做到这一点。使用他们的例子:

public class MyWritableComparable implements 
     WritableComparable<MyWritableComparable> { 

    // Some data 
    private int counter; 
    private long timestamp; 

    public void write(DataOutput out) throws IOException { 
     out.writeInt(counter); 
     out.writeLong(timestamp); 
    } 

    public void readFields(DataInput in) throws IOException { 
     counter = in.readInt(); 
     timestamp = in.readLong(); 
    } 

    public int compareTo(MyWritableComparable other) { 
     int thisValue = this.counter; 
     int thatValue = other.counter; 
     return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1)); 
    } 
} 

而且应该是这样的。这假定您正在使用Hadoop API的修订0.20.20.21.0