2015-07-10 97 views
0

我在Reducer中使用MultipleOutputs。多输出将文件写入名为NewIdentities的文件夹。代码如下所示:使用FileAlreadyExistsException的Reducer中的Hadoop MultipleOutputs

private MultipleOutputs<Text,Text> mos; 
@Override 
public void reduce(Text inputKey, Iterable<Text> values, Context context) throws IOException, InterruptedException { 
     ...... 
     // output to change report 
     if (ischangereport.equals("TRUE")) { 
      mos.write(new Text(e.getHID()), new Text(changereport.deleteCharAt(changereport.length() - 1).toString()), "NewIdentities/"); 
     } 
    } 
} 

@Override 
public void setup(Context context) { 
    mos = new MultipleOutputs<Text,Text>(context); 
} 

@Override 
protected void cleanup(Context context) throws IOException, InterruptedException { 
    mos.close(); 
} 

它可以先运行。但是当我今天运行它时,它会抛出一个异常,如下所示。我的hadoop版本是2.4.0。

错误:org.apache.hadoop.fs.FileAlreadyExistsException:/ CaptureOnlyMatchIndex9/TEMP/ChangeReport/NewIdentities/-R-00000为客户192.168.71.128已经存在于org.apache.hadoop.hdfs.server.namenode.FSNamesystem .startFileInternal(FSNamesystem.java:2297)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2225)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem .java:2178)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:520)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:354)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos $ ClientNamenodeProtocol $ 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine $ Server $ ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)at org.apache.hadoop.ipc.RPC $ Server.call(RPC.java:928)at org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java :2013)org.apache.hadoop.ipc.Server $ Handler $ 1.run(Server.java:2009)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject。 java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)at org.apache.hadoop.ipc.Server $ Handler.run(Server.java:2007)at sun.reflect.NativeConstructorAccessorImpl .newInstance0(Native Method)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java :526)at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteEx org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1604)at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1465)at org.apache.hadoop .hdfs.DFSClient.create(DFSClient.java:1390)at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall(DistributedFileSystem.java:394)at org.apache.hadoop.hdfs.DistributedFileSystem $ 6.doCall(DistributedFileSystem.java :390)at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:390)at org.apache.hadoop.hdfs。在org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)处的DistributedFileSystem.create(DistributedFileSystem.java:334) .apache.hadoop.fs.FileSystem.create(FileSystem.java:784)at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:132)at org.apache.hadoop.mapreduce.lib .ou tput.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)at

回答

2

我找到了它的原因。因为在我的一个减速器中,它耗尽了内存。所以它会隐式抛出一个内存不足的异常。 hadoop停止当前的多个输出。也许reducer的另一个线程想要输出,所以它创建了另一个多输出对象,所以发生了碰撞。

相关问题