HBase使用从Hadoop放置，但在HBase shell中看不到价值

我有一个简单的map/reduce作业，扫描一个hbase表并修改另一个hbase表。 hadoop作业似乎可以成功完成，但是当我检查hbase表时，条目不会显示在那里。HBase使用从Hadoop放置，但在HBase shell中看不到价值

这里是Hadoop的程序：

import java.io.IOException; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.conf.Configured; 
import org.apache.hadoop.hbase.HBaseConfiguration; 
import org.apache.hadoop.hbase.client.Put; 
import org.apache.hadoop.hbase.client.Result; 
import org.apache.hadoop.hbase.client.Scan; 
import org.apache.hadoop.hbase.io.ImmutableBytesWritable; 
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; 
import org.apache.hadoop.hbase.mapreduce.TableMapper; 
import org.apache.hadoop.hbase.util.Bytes; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat; 
import org.apache.hadoop.util.Tool; 
import org.apache.hadoop.util.ToolRunner; 

public class HBaseInsertTest extends Configured implements Tool { 

    @Override 
    public int run(String[] args) throws Exception { 
     String table = "duplicates"; 

     Scan scan = new Scan(); 
     scan.setCaching(500); 
     scan.setCacheBlocks(false); 

     Job job = new Job(getConf(), "HBaseInsertTest"); 
     job.setJarByClass(HBaseInsertTest.class); 

     TableMapReduceUtil.initTableMapperJob(table, scan, Mapper.class, /* mapper output key = */null, 
      /* mapper output value= */null, job); 
     TableMapReduceUtil.initTableReducerJob("tablecopy", /*output table=*/null, /*reducer class=*/job); 

     job.setNumReduceTasks(0); 

     // Note that these are the default. 
     job.setOutputFormatClass(NullOutputFormat.class); 

     return job.waitForCompletion(true) ? 0 : 1; 
    } 

    private static class Mapper extends TableMapper<ImmutableBytesWritable, Put> { 
     @Override 
     protected void setup(Context context) throws IOException, InterruptedException { 
      super.setup(context); 
     } 

     @Override 
     public void map(ImmutableBytesWritable row, Result columns, Context context) throws IOException { 
      long id = 1260018L; 

      try { 
       Put put = new Put(Bytes.toBytes(id)); 
       put.add(Bytes.toBytes("mapping"), Bytes.toBytes("foo"), Bytes.toBytes("bar")); 
       context.write(row, put); 
      } catch (InterruptedException e) { 
       e.printStackTrace(); 
      } 
     } 
    } 

    public static void main(String[] args) throws Exception { 
     Configuration config = HBaseConfiguration.create(); 
     int res = ToolRunner.run(config, new HBaseInsertTest(), args); 
     System.exit(res); 
    } 
}

从HBase的外壳：

hbase(main):008:0> get 'tablecopy', '1260018', 'mapping' 
COLUMN       CELL                      
0 row(s) in 0.0100 seconds

我已经简化程序很多试图证明/隔离问题。我对hadoop/hbase也相对陌生。我确实验证了映射是存在于tablecopy表中的列族。

来源

2012-02-28 kfox

可能没有输出？尝试在'context.write'之前打印'row'和'put' – 2012-02-29 06:17:47

有输出。切换到字符串键可以解决问题。 – kfox 2012-03-13 05:34:09

我认为这个问题是你查询 HBase的（主要）：008：0>获得 'tablecopy'， '1260018'， '映射'

，而不是你应该质疑的是： HBase的（主要）：008：0>获取'tablecopy'，1260018，'映射'

HBase认为这是您查询的字符串键，因为引用。另外，如果您只是在您的最后运行一个简单的客户端作业来从HBase中检索此密钥，那么它将在您已经存在的情况下正确地获得值。

来源

2012-03-23 12:43:40

我以为我试过它没有引号，但我猜不是。感谢你的回答！ – kfox 2012-03-23 23:13:57

-1

你的问题在于你缺乏还原剂。您需要创建一个扩展为TableReducer的类，该类输入为Put，并使用context.write(ImmutableBytesWritable key, Put put)将该Put写入目标表。

我想象的还要看起来像这样：

public static class MyReducer extends TableReducer<ImmutableBytesWritable, Put, ImmutableBytesWritable> { 

    public void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) 
     throws IOException, InterruptedException { 
    for (Put record : values) { 
     context.write(key, record); 
    } 
    } 
}

然后，您修改表减速初始化程序为： TableMapReduceUtil.initTableReducerJob("tablecopy", MyReducer.class, job);

应该这样做。另一种选择是继续没有减速，并在映射器打开一个HTable对象，并通过它直接写放这样的：

HTable table = new HTable(Context.getConfiguration(), "output_table_name"); 
Put myPut = ...; 
table.put(myPut); 
table.close();

希望这有助于！

来源

2012-03-01 22:23:08 fredugolon

我实际上使用了你提出的第二种方式，但那也没有效果。我认为这个问题的关键在于一个很长的shell和获取字符串键的hbase shell。 – kfox 2012-03-13 05:33:29

一般来说，我认为你不想使用reducer将数据放入hbase，除非你有一个比这更复杂的任务。通过将hbase放入reducer中，您可以引入整个shuffle/sort阶段，这会使您的性能无法获得明显的好处：hbase是随机访问，并且不需要按顺序键 – David 2012-04-06 23:18:02

是的 - 好点。回顾一下，你绝对正确 – fredugolon 2012-04-07 22:43:38

HBase使用从Hadoop放置，但在HBase shell中看不到价值

回答

相关问题