2009-06-30 183 views
54

我必须在文本[csv]文件中写入大量数据。我使用BufferedWriter写入数据,写入174 MB数据需要大约40秒。这是java可以提供的最快速度吗?在文本文件中写入大量数据的最快方法Java

bufferedWriter = new BufferedWriter (new FileWriter ("fileName.csv")); 

注:这40秒包括迭代和取出由结果集的记录,以及时间。 :)。结果集中174 MB用于400000行。

+3

你不会碰巧有抗病毒的计算机上的活动,你运行该代码? – 2011-09-18 19:24:42

回答

87

你可能会尝试删除BufferedWriter并直接使用FileWriter。在现代系统中,无论如何,您只需写入驱动器的高速缓存即可。

我需要4-5秒的时间来编写175MB(400万字符串) - 这是一个双核2.4GHz戴尔,运行带有80GB,7200转日立磁盘的Windows XP。

你能隔离多少时间是记录检索和文件写入多少?

import java.io.BufferedWriter; 
import java.io.File; 
import java.io.FileWriter; 
import java.io.IOException; 
import java.io.Writer; 
import java.util.ArrayList; 
import java.util.List; 

public class FileWritingPerfTest { 


private static final int ITERATIONS = 5; 
private static final double MEG = (Math.pow(1024, 2)); 
private static final int RECORD_COUNT = 4000000; 
private static final String RECORD = "Help I am trapped in a fortune cookie factory\n"; 
private static final int RECSIZE = RECORD.getBytes().length; 

public static void main(String[] args) throws Exception { 
    List<String> records = new ArrayList<String>(RECORD_COUNT); 
    int size = 0; 
    for (int i = 0; i < RECORD_COUNT; i++) { 
     records.add(RECORD); 
     size += RECSIZE; 
    } 
    System.out.println(records.size() + " 'records'"); 
    System.out.println(size/MEG + " MB"); 

    for (int i = 0; i < ITERATIONS; i++) { 
     System.out.println("\nIteration " + i); 

     writeRaw(records); 
     writeBuffered(records, 8192); 
     writeBuffered(records, (int) MEG); 
     writeBuffered(records, 4 * (int) MEG); 
    } 
} 

private static void writeRaw(List<String> records) throws IOException { 
    File file = File.createTempFile("foo", ".txt"); 
    try { 
     FileWriter writer = new FileWriter(file); 
     System.out.print("Writing raw... "); 
     write(records, writer); 
    } finally { 
     // comment this out if you want to inspect the files afterward 
     file.delete(); 
    } 
} 

private static void writeBuffered(List<String> records, int bufSize) throws IOException { 
    File file = File.createTempFile("foo", ".txt"); 
    try { 
     FileWriter writer = new FileWriter(file); 
     BufferedWriter bufferedWriter = new BufferedWriter(writer, bufSize); 

     System.out.print("Writing buffered (buffer size: " + bufSize + ")... "); 
     write(records, bufferedWriter); 
    } finally { 
     // comment this out if you want to inspect the files afterward 
     file.delete(); 
    } 
} 

private static void write(List<String> records, Writer writer) throws IOException { 
    long start = System.currentTimeMillis(); 
    for (String record: records) { 
     writer.write(record); 
    } 
    writer.flush(); 
    writer.close(); 
    long end = System.currentTimeMillis(); 
    System.out.println((end - start)/1000f + " seconds"); 
} 
} 
+2

@rozario每个写入调用应该只产生约175MB,然后删除它自己。如果不是这样,你最终会得到175MB×4个不同的写入调用×5次迭代= 3.5GB的数据。你可以检查file.delete()的返回值,如果它是false,则抛出一个异常。 – 2011-04-14 16:42:14

+0

请注意``writer.flush()`在这种情况下不是必需的,因为`writer.close()`[刷新内存](http://docs.oracle.com/javase/7/docs/api/java/io /BufferedWriter.html)隐含性。顺便说一下:最佳做法建议使用[尝试资源关闭](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html)而不是明确调用close()。 – 2015-04-25 16:53:35

+2

FWIW,这是为Java 5编写的,至少没有记录在最后,但没有尝试使用资源。它可能会使用更新。 – 2015-04-27 20:48:37

4

您的传输速度可能不会受到Java的限制。相反,我会怀疑(排名不分先后)

  1. 转移从数据库
  2. 转移到磁盘

的速度的速度。如果你读了完整的数据集,然后将它写出来到磁盘,那么这将花费更长的时间,因为JVM将不得不分配内存,并且db rea/disk write将会按顺序发生。相反,我会写出缓冲的作家为你从数据库中做的每一个读取,因此操作将更接近并发的(我不知道你是否在做这个或不)

28

尝试内存映射文件(需要300米/ s到174MB写在我的M/C,Core 2 Duo处理器,2.5GB RAM):

byte[] buffer = "Help I am trapped in a fortune cookie factory\n".getBytes(); 
int number_of_lines = 400000; 

FileChannel rwChannel = new RandomAccessFile("textfile.txt", "rw").getChannel(); 
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE, 0, buffer.length * number_of_lines); 
for (int i = 0; i < number_of_lines; i++) 
{ 
    wrBuf.put(buffer); 
} 
rwChannel.close(); 
+0

什么是aMessage.length()意味着什么时候你实例化ByteBuffer? – Hotel 2012-09-27 19:39:30

14

仅用于统计的缘故:

的机器是旧的戴尔新的SSD

CPU:英特尔奔腾d 2,8千兆赫

SSD:爱国者地狱120GB SSD

4000000 'records' 
175.47607421875 MB 

Iteration 0 
Writing raw... 3.547 seconds 
Writing buffered (buffer size: 8192)... 2.625 seconds 
Writing buffered (buffer size: 1048576)... 2.203 seconds 
Writing buffered (buffer size: 4194304)... 2.312 seconds 

Iteration 1 
Writing raw... 2.922 seconds 
Writing buffered (buffer size: 8192)... 2.406 seconds 
Writing buffered (buffer size: 1048576)... 2.015 seconds 
Writing buffered (buffer size: 4194304)... 2.282 seconds 

Iteration 2 
Writing raw... 2.828 seconds 
Writing buffered (buffer size: 8192)... 2.109 seconds 
Writing buffered (buffer size: 1048576)... 2.078 seconds 
Writing buffered (buffer size: 4194304)... 2.015 seconds 

Iteration 3 
Writing raw... 3.187 seconds 
Writing buffered (buffer size: 8192)... 2.109 seconds 
Writing buffered (buffer size: 1048576)... 2.094 seconds 
Writing buffered (buffer size: 4194304)... 2.031 seconds 

Iteration 4 
Writing raw... 3.093 seconds 
Writing buffered (buffer size: 8192)... 2.141 seconds 
Writing buffered (buffer size: 1048576)... 2.063 seconds 
Writing buffered (buffer size: 4194304)... 2.016 seconds 

正如我们可以看到的原始方法是慢的缓冲。

1

package all.is.well; 
 
import java.io.IOException; 
 
import java.io.RandomAccessFile; 
 
import java.util.concurrent.ExecutorService; 
 
import java.util.concurrent.Executors; 
 
import junit.framework.TestCase; 
 

 
/** 
 
* @author Naresh Bhabat 
 
* 
 
Following implementation helps to deal with extra large files in java. 
 
This program is tested for dealing with 2GB input file. 
 
There are some points where extra logic can be added in future. 
 

 

 
Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object. 
 

 

 

 
It uses random access file,which is almost like streaming API. 
 

 

 
* **************************************** 
 
Notes regarding executor framework and its readings. 
 
Please note :ExecutorService executor = Executors.newFixedThreadPool(10); 
 

 
* \t for 10 threads:Total time required for reading and writing the text in 
 
*   :seconds 349.317 
 
* 
 
*   For 100:Total time required for reading the text and writing : seconds 464.042 
 
* 
 
*   For 1000 : Total time required for reading and writing text :466.538 
 
*   For 10000 Total time required for reading and writing in seconds 479.701 
 
* 
 
* 
 
*/ 
 
public class DealWithHugeRecordsinFile extends TestCase { 
 

 
\t static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt"; 
 
\t static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt"; 
 
\t static volatile RandomAccessFile fileToWrite; 
 
\t static volatile RandomAccessFile file; 
 
\t static volatile String fileContentsIter; 
 
\t static volatile int position = 0; 
 

 
\t public static void main(String[] args) throws IOException, InterruptedException { 
 
\t \t long currentTimeMillis = System.currentTimeMillis(); 
 

 
\t \t try { 
 
\t \t \t fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles 
 
\t \t \t file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles 
 
\t \t \t seriouslyReadProcessAndWriteAsynch(); 
 

 
\t \t } catch (IOException e) { 
 
\t \t \t // TODO Auto-generated catch block 
 
\t \t \t e.printStackTrace(); 
 
\t \t } 
 
\t \t Thread currentThread = Thread.currentThread(); 
 
\t \t System.out.println(currentThread.getName()); 
 
\t \t long currentTimeMillis2 = System.currentTimeMillis(); 
 
\t \t double time_seconds = (currentTimeMillis2 - currentTimeMillis)/1000.0; 
 
\t \t System.out.println("Total time required for reading the text in seconds " + time_seconds); 
 

 
\t } 
 

 
\t /** 
 
\t * @throws IOException 
 
\t * Something asynchronously serious 
 
\t */ 
 
\t public static void seriouslyReadProcessAndWriteAsynch() throws IOException { 
 
\t \t ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class 
 
\t \t while (true) { 
 
\t \t \t String readLine = file.readLine(); 
 
\t \t \t if (readLine == null) { 
 
\t \t \t \t break; 
 
\t \t \t } 
 
\t \t \t Runnable genuineWorker = new Runnable() { 
 
\t \t \t \t @Override 
 
\t \t \t \t public void run() { 
 
\t \t \t \t \t // do hard processing here in this thread,i have consumed 
 
\t \t \t \t \t // some time and eat some exception in write method. 
 
\t \t \t \t \t writeToFile(FILEPATH_WRITE, readLine); 
 
\t \t \t \t \t // System.out.println(" :" + 
 
\t \t \t \t \t // Thread.currentThread().getName()); 
 

 
\t \t \t \t } 
 
\t \t \t }; 
 
\t \t \t executor.execute(genuineWorker); 
 
\t \t } 
 
\t \t executor.shutdown(); 
 
\t \t while (!executor.isTerminated()) { 
 
\t \t } 
 
\t \t System.out.println("Finished all threads"); 
 
\t \t file.close(); 
 
\t \t fileToWrite.close(); 
 
\t } 
 

 
\t /** 
 
\t * @param filePath 
 
\t * @param data 
 
\t * @param position 
 
\t */ 
 
\t private static void writeToFile(String filePath, String data) { 
 
\t \t try { 
 
\t \t \t // fileToWrite.seek(position); 
 
\t \t \t data = "\n" + data; 
 
\t \t \t if (!data.contains("Randomization")) { 
 
\t \t \t \t return; 
 
\t \t \t } 
 
\t \t \t System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data); 
 
\t \t \t System.out.println("Lets consume through this loop"); 
 
\t \t \t int i=1000; 
 
\t \t \t while(i>0){ 
 
\t \t \t 
 
\t \t \t \t i--; 
 
\t \t \t } 
 
\t \t \t fileToWrite.write(data.getBytes()); 
 
\t \t \t throw new Exception(); 
 
\t \t } catch (Exception exception) { 
 
\t \t \t System.out.println("exception was thrown but still we are able to proceeed further" 
 
\t \t \t \t \t + " \n This can be used for marking failure of the records"); 
 
\t \t \t //exception.printStackTrace(); 
 

 
\t \t } 
 

 
\t } 
 
}

相关问题