后台打印目录CSV文件格式:sample.csv加载CSV数据到使用多列HBase的表水槽
8600000US00601,00601,006015-DigitZCTA,0063-DigitZCTA,11102
8600000US00602,00602,006025-DigitZCTA,0063-DigitZCTA,12869
8600000US00603,00603,006035-DigitZCTA,0063-DigitZCTA,12423
8600000US00604,00604,006045-DigitZCTA,0063-DigitZCTA,33548
8600000US00606,00606,006065-DigitZCTA,0063-DigitZCTA,10603
我Flume.Conf代码:
agent.sources = spool
agent.channels = fileChannel2
agent.sinks = sink2
agent.sources.spool.type = spooldir
agent.sources.spool.spoolDir = /home/cloudera/cloudera
agent.sources.spool.fileSuffix = .completed
agent.sources.spool.channels = fileChannel2
#agent.sources.spool.deletePolicy = immediate
agent.sinks.sink2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.sink2.channel = fileChannel2
agent.sinks.sink2.table = sample
agent.sinks.sink2.columnFamily = s1
agent.sinks.sink2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.sink1.serializer.regex = ^([^,]+),([^,]+),([^,]+),([^,]+)$
#agent.sinks.sink2.serializer.regexIgnoreCase = true
agent.sinks.sink1.serializer.colNames =col1,col2,col3,col4
agent.sinks.sink2.batchSize = 100
agent.channels.fileChannel2.type=memory
我能使用水槽将数据加载到单个列中,但无法使用正则表达式,任何帮助将其加载到多个列中,以便我可以将它加载到hbase.Thanks中的多个列中。
你有没有得到你的答案? – 2015-07-13 06:46:03
如果你有答案,请分享。谢谢。 – sayan 2015-08-26 06:35:34
我有同样的问题:(请分享!!! – akaliza 2015-12-23 11:08:47