2014-09-06 70 views
0

我目前在水槽此配置:与水槽HDFS问题沉从Twitter

# Licensed to the Apache Software Foundation (ASF) under one 
# or more contributor license agreements. See the NOTICE file 
# distributed with this work for additional information 
# regarding copyright ownership. The ASF licenses this file 
# to you under the Apache License, Version 2.0 (the 
# "License"); you may not use this file except in compliance 
# with the License. You may obtain a copy of the License at 
# 
# http://www.apache.org/licenses/LICENSE-2.0 
# 
# Unless required by applicable law or agreed to in writing, 
# software distributed under the License is distributed on an 
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 
# KIND, either express or implied. See the License for the 
# specific language governing permissions and limitations 
# under the License. 
# The configuration file needs to define the sources, 
# the channels and the sinks. 
# Sources, channels and sinks are defined per agent, 
# in this case called 'TwitterAgent' 
TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS 

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource 
TwitterAgent.sources.Twitter.channels = MemChannel 
TwitterAgent.sources.Twitter.consumerKey = YPTxqtRamIZ1bnJXYwGW 
TwitterAgent.sources.Twitter.consumerSecret = Wjyw9714OBzao7dktH0csuTByk4iLG9Zu4ddtI6s0ho 
TwitterAgent.sources.Twitter.accessToken = 2340010790-KhWiNLt63GuZ6QZNYuPMJtaMVjLFpiMP4A2v 
TwitterAgent.sources.Twitter.accessTokenSecret = x1pVVuyxfvaTbPoKvXqh2r5xUA6tf9einoByLIL8rar 
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing 
TwitterAgent.sinks.HDFS.channel = MemChannel 
TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/ 
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 
TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 10000 
TwitterAgent.channels.MemChannel.transactionCapacity = 100 

Twitter的应用程序身份验证密钥是正确的。 我不断收到这个错误在水槽日志文件:

ERROR org.apache.flume.SinkRunner  

Unable to deliver event. Exception follows. 
org.apache.flume.EventDeliveryException: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1 
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:446) 
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) 
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) 
    at java.lang.Thread.run(Thread.java:662) 
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1 
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414) 
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164) 
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129) 
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448) 
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2310) 
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) 
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2344) 
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2326) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:353) 
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194) 
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:227) 
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:221) 
    at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:589) 
    at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:161) 
    at org.apache.flume.sink.hdfs.BucketWriter.access$800(BucketWriter.java:57) 
    at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:586) 
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
    ... 1 more 
Caused by: java.net.UnknownHostException: hadoop1 
    ... 23 more 

没有任何一个在这里知道为什么,可能是给我解释一下? 在此先感谢。

回答

0

根据例外情况,问题在于主机hadoop1未知。

根据水槽配置文件你给的路径是

hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/ 

这应该是从与水槽代理的计算机访问。由于机器名不能用于访问HDFS而不在同一个域中,因此需要使用在core-site.xml

+0

@AntariskshaYelkawar中设置的IP地址访问HDFS。该文件是** core-site.xml **。谢谢。 – Xorsist 2014-09-08 12:00:22