我试图从Amazon S3中提取特定文件,而无需读取所有字节,因为存档可能很大,而我只需要2或3个文件即可。部分读取Amazon S3中的tar.gz文件
我正在使用AWS Java SDK。下面的代码(异常处理跳过):
AWSCredentials credentials = new BasicAWSCredentials("accessKey", "secretKey");
AWSCredentialsProvider credentialsProvider = new AWSStaticCredentialsProvider(credentials);
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion(Regions.US_EAST_1).withCredentials(credentialsProvider).build();
S3Object object = s3Client.getObject("bucketname", "file.tar.gz");
S3ObjectInputStream objectContent = object.getObjectContent();
TarArchiveInputStream tarInputStream = new TarArchiveInputStream(new GZIPInputStream(objectContent));
TarArchiveEntry currentEntry;
while((currentEntry = tarInputStream.getNextTarEntry()) != null) {
if(currentEntry.getName().equals("1/foo.bar") && currentEntry.isFile()) {
FileOutputStream entryOs = new FileOutputStream("foo.bar");
IOUtils.copy(tarInputStream, entryOs);
entryOs.close();
break;
}
}
objectContent.abort(); // Warning at this line
tarInputStream.close(); // warning at this line
当我用这个方法它给,并非所有从流的字节,我特意做了阅读的警告。
WARNING: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
是否需要排流,什么将是不这样做的缺点?我可以忽略警告吗?
是的,我可能会阅读整个文件一些,但在大多数情况下,我想我可以节省一些阅读;不,我不能影响文件上传的方式。我的问题是这个警告是否可以忽略,如果这个信息流没有消耗,它会有什么影响? – ares
公平的评论 - 警告可以忽略。它会告诉你,你会丢失一些传输中的内容,因为它会终止HTTP连接。 'close()'委托给'abort()',因此它也会导致这个警告 - 现在添加到答案中 – diginoise