我正在编写一个程序来扫描文本文件中的字符串(行)块并在找到时将块输出到文件 在我的过程类中,函数proc()采用异常很长时间来处理一个6MB的文件。在之前编写的程序中,我只在一个特定类型的字符串中扫描文本,因此需要5秒来处理同一个文件。现在我重写它来扫描不同字符串的存在。它是花了8分钟,这是一个重大的差异。有没有人有任何想法如何优化这个功能?如何优化此功能? c#扫描字符串的文本文件
这是我的正则表达式
System.Text.RegularExpressions.Regex RegExp { get { return new System.Text.RegularExpressions.Regex(@"(?s)(?-m)MSH.+?(?=[\r\n]([^A-Z0-9]|.{1,2}[^A-Z0-9])|$)", System.Text.RegularExpressions.RegexOptions.Compiled); } }
。
public static class TypeFactory
{
public static List<IMessageType> GetTypeList()
{
List<IMessageType> types = new List<IMessageType>();
types.AddRange(from assembly in AppDomain.CurrentDomain.GetAssemblies()
from t in assembly.GetTypes()
where t.IsClass && t.GetInterfaces().Contains(typeof(IMessageType))
select Activator.CreateInstance(t) as IMessageType);
return types;
}
}
public class process
{
public void proc()
{
IOHandler.Read reader = new IOHandler.Read(new string[1] { @"C:\TEMP\DeIdentified\DId_RSLTXMIT.LOG" });
List<IMessageType> types = MessageType.TypeFactory.GetTypeList();
//TEST1
IOHandler.Write.writeReport(System.DateTime.Now.ToString(), "TEST", "v3test.txt", true);
foreach (string file in reader.FileList)
{
using (FileStream readStream = new FileStream(file, FileMode.Open, FileAccess.Read))
{
int charVal = 0;
Int64 position = 0;
StringBuilder fileFragment = new StringBuilder();
string message = string.Empty;
string current = string.Empty;
string previous = string.Empty;
int currentLength = 0;
int previousLength = 0;
bool found = false;
do
{
//string line = reader.ReturnLine(readStream, out charVal, ref position);
string line = reader.ReturnLine(readStream, out charVal);
for (int i = 0; i < types.Count; i++)
{
if (Regex.IsMatch(line, types[i].BeginIndicator)) //found first line of a message type
{
found = true;
message += line;
do
{
previousLength = types[i].RegExp.Match(message).Length;
//keep adding lines until match length stops growing
//message += reader.ReturnLine(readStream, out charVal, ref position);
message += reader.ReturnLine(readStream, out charVal);
currentLength = types[i].RegExp.Match(message).Length;
if (currentLength == previousLength)
{
//stop - message complete
IOHandler.Write.writeReport(message, "TEST", "v3test.txt", true);
//reset
message = string.Empty;
currentLength = 0;
previousLength = 0;
break;
}
} while (charVal != -1);
break;
}
}
} while (charVal != -1);
//END OF FILE CONDITION
if (charVal == -1)
{
}
}
}
IOHandler.Write.writeReport(System.DateTime.Now.ToString(), "TEST", "v3test.txt", true);
}
}
。
编辑:我跑了剖析在VS2012的向导,我发现大部分时间用在RegEx.Match功能
你有没有试过分析你的代码,看看瓶颈在哪里? – 2013-03-07 20:50:43
你也可以实现一个基本的记录器,如果分析器没有帮助,它会在每个地点放置一个'timestamp'来找到瓶颈。 – Greg 2013-03-07 20:52:47
当我看到'StringBuilder'时,我很希望你用它来*建立字符串*,然后我看到'message + = line;':( – RobH 2013-03-07 20:57:42