我有一个目录中的非常大(〜300 MB)文件的列表,需要使用awk脚本多次过滤,每次使用不同的搜索参数。 我已经编写了一个程序,它使用fixedThreadPool执行程序生成多个线程,并且每个线程内的任务实现都会创建一个新的Runtime()对象,并通过一个使用bash shell执行的新Process来执行awk脚本脚本哪一个更快:从控制台读取或写入文件和阅读?
下面是一个示例代码:
类MultiThreadingImpl:
public class MultiThreadingImpl {
static List<File> filesList = new ArrayList<File>();
public static void main(String[] args) {
int numThreads = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(numThreads);//creating a pool of 5 threads
File logsDir = new File("TestFilesDir");
getLogFiles(logsDir);
String[] searchKeys = {"123456","PAT1"};
for (int i = 0; i < filesList.size() ; i++) {
Runnable worker = new WorkerThread(filesList.get(i),searchKeys[i]);
executor.execute(worker);//calling execute method of ExecutorService
}
executor.shutdown();
while (!executor.isTerminated()) { }
System.out.println("Finished all threads");
}
private static void getLogFiles(File logsDir) {
assert(logsDir.isDirectory());
for(File f : logsDir.listFiles(
new FilenameFilter(){
public boolean accept(File dir, String name) {
return !name.endsWith("_result.txt");
}
}
)){
filesList.add(f);
}
}
}
类的WorkerThread:
class WorkerThread implements Runnable {
private String outputFile;
private String searchKey;
private File logFile;
public WorkerThread(File logFile,String searchKey){
this.logFile = logFile;
this.searchKey = searchKey;
this.outputFile = String.format(logFile.getName().replace(".txt", "") + "_result.txt");
}
public void run() {
int res = 0;
Runtime runtime = Runtime.getRuntime();
String awkRegex = new StringBuilder("'/([0-9]{1}|[0-9]{2})[[:space:]][[:alpha:]]+[[:space:]][0-9]{4}/{n=0}")
.append("/"+searchKey+"/").append("{n=1} n' ").toString();
String awkCommand = new StringBuilder("/usr/bin/awk ").append(awkRegex)
.append(logFile.getAbsolutePath()).append(" &> ").append("/TestFilesDir").append(outputFile).toString();
System.out.println(Thread.currentThread().getName() + ":: Command : " + awkCommand);
String[] cmdList = { "/bin/bash", "-c", awkCommand};
try {
final Process process = runtime.exec(cmdList);
res = process.waitFor();
BufferedReader stdInput = new BufferedReader(new InputStreamReader(process.getInputStream()));
BufferedReader stdError = new BufferedReader(new InputStreamReader(process.getErrorStream()));
while (stdInput.readLine() != null) {
//Emptying stream
}
StringBuffer strerror = new StringBuffer();
String serror = null;
while ((serror = stdError.readLine()) != null) {
strerror.append(serror + "\n");
}
System.out.println(Thread.currentThread().getName() + ":: Process Exit value: " + res);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
这里我可以选择写入每个输入文件的唯一输出文件,然后使用cat
合并它们,最后读取合并的文件。
而且我也可以选择将每个Process的输出流的输出读入一个字符串并合并所有字符串。
哪种机制更快?
还建议是否有办法让整个事情更快?
为什么不自己尝试一下,看看哪个更快? – Cristina