扫描使用HBase的外壳

46

试试这个。这有点丑陋，但它适用于我。

import org.apache.hadoop.hbase.filter.CompareFilter 
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter 
import org.apache.hadoop.hbase.filter.SubstringComparator 
import org.apache.hadoop.hbase.util.Bytes 
scan 't1', { COLUMNS => 'family:qualifier', FILTER => 
    SingleColumnValueFilter.new 
     (Bytes.toBytes('family'), 
     Bytes.toBytes('qualifier'), 
     CompareFilter::CompareOp.valueOf('EQUAL'), 
     SubstringComparator.new('somevalue')) 
}

HBase的外壳将包括无论你在〜/ .irbrc，所以你可以把这样的事情在那里（我不是红宝石的专家，改进欢迎）：

# imports like above 
def scan_substr(table,family,qualifier,substr,*cols) 
    scan table, { COLUMNS => cols, FILTER => 
     SingleColumnValueFilter.new 
      (Bytes.toBytes(family), Bytes.toBytes(qualifier), 
      CompareFilter::CompareOp.valueOf('EQUAL'), 
      SubstringComparator.new(substr)) } 
end

然后你可以说，在外壳：

scan_substr 't1', 'family', 'qualifier', 'somevalue', 'family:qualifier'

来源

2011-09-16 16:07:29 havanki4j

+0

这确实是超级丑陋的。不过谢谢，在HBase docs/book/oreilly书中找不到这样的例子。 – mumrah

8

使用的过滤特性参数的scan，如图所示用法帮助：

hbase(main):002:0> scan 

ERROR: wrong number of arguments (0 for 1) 

Here is some help for this command: 
Scan a table; pass table name and optionally a dictionary of scanner 
specifications. Scanner specifications may include one or more of: 
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, 
or COLUMNS. If no columns are specified, all columns will be scanned. 
To scan all members of a column family, leave the qualifier empty as in 
'col_family:'. 

Some examples: 

    hbase> scan '.META.' 
    hbase> scan '.META.', {COLUMNS => 'info:regioninfo'} 
    hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} 
    hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} 
    hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} 

For experts, there is an additional option -- CACHE_BLOCKS -- which 
switches block caching for the scanner on (true) or off (false). By 
default it is enabled. Examples: 

    hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

来源

2011-08-31 21:03:58 Tony

28

scan 'test', {COLUMNS => ['F'],FILTER => \ 
"(SingleColumnValueFilter('F','u',=,'regexstring:http:.*pdf',true,true)) AND \ 
(SingleColumnValueFilter('F','s',=,'binary:2',true,true))"}

更多信息，可以发现here。请注意，附件Filter Language.docx文件中有多个示例。

来源

2012-06-28 02:13:25 dape

+0

我认为这个过滤器解析语言只适用于Hbase的更高版本 - 在0.90.6（cdh 3u6）我无法获得任何变化的工作。 – Mikeb

+0

我认为看javadoc是非常有用的;这里是0.94的javadoc：http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html – mooreds

6

Scan scan = new Scan(); 
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); 

//in case you have multiple SingleColumnValueFilters, 
you would want the row to pass MUST_PASS_ALL conditions 
or MUST_PASS_ONE condition. 

SingleColumnValueFilter filter_by_name = new SingleColumnValueFilter( 
        Bytes.toBytes("SOME COLUMN FAMILY"), 
        Bytes.toBytes("SOME COLUMN NAME"), 
        CompareOp.EQUAL, 
        Bytes.toBytes("SOME VALUE")); 

filter_by_name.setFilterIfMissing(true); 
//if you don't want the rows that have the column missing. 
Remember that adding the column filter doesn't mean that the 
rows that don't have the column will not be put into the 
result set. They will be, if you don't include this statement. 

list.addFilter(filter_by_name); 


scan.setFilter(list);

来源

2014-02-18 07:03:53 KannarKK

+0

这段代码是用Java编写的，问题在于询问HBase shell。 – Tony

3

其中一个过滤器的是Valuefilter可用于过滤所有列的值。

hbase(main):067:0> scan 'dummytable', {FILTER => "ValueFilter(=,'binary:2016-01-26')"}

二进制是过滤器内所使用的比较器之一。根据你想要做的事情，你可以在过滤器中使用不同的比较器。

您可以参考以下url：http：// www.hadooptpoint.com/filters-in-hbase-shell/. 它提供了有关如何在HBase Shell中使用不同过滤器的很好示例。

来源

2016-02-12 21:17:21

+0

链接只有答案不是很好的问题。发布一些代码并解释它以提供帮助。 – KittMedia

扫描使用HBase的外壳

回答

相关问题