2009-09-10 138 views
8

我在我的C#代码中使用string.split()来读取tab分隔文件。我正在面对“OutOfMemory异常”,如代码示例中所述。string.split()当读取tab分隔文件时出现“内存不足异常”

在这里,我想知道为什么问题来了大小为16 MB的文件?

这是正确的方法吗?

using (StreamReader reader = new StreamReader(_path)) 
{ 
    //...........Load the first line of the file................ 
    string headerLine = reader.ReadLine(); 

    MeterDataIPValueList objMeterDataList = new MeterDataIPValueList(); 
    string[] seperator = new string[1]; //used to sepreate lines of file 

    seperator[0] = "\r\n"; 
    //.............Load Records of file into string array and remove all empty lines of file................. 
    string[] line = reader.ReadToEnd().Split(seperator, StringSplitOptions.RemoveEmptyEntries); 
    int noOfLines = line.Count(); 
    if (noOfLines == 0) 
    { 
    mFileValidationErrors.Append(ConstMsgStrings.headerOnly + Environment.NewLine); 
    } 
    //...............If file contains records also with header line.............. 
    else 
    { 
    string[] headers = headerLine.Split('\t'); 
    int noOfColumns = headers.Count(); 

    //.........Create table structure............. 
    objValidateRecordsTable.Columns.Add("SerialNo"); 
    objValidateRecordsTable.Columns.Add("SurveyDate"); 
    objValidateRecordsTable.Columns.Add("Interval"); 
    objValidateRecordsTable.Columns.Add("Status"); 
    objValidateRecordsTable.Columns.Add("Consumption"); 

    //........Fill objValidateRecordsTable table by string array contents ............ 

    int recordNumber; // used for log 
    #region ..............Fill objValidateRecordsTable..................... 
    seperator[0] = "\t"; 
    for (int lineNo = 0; lineNo < noOfLines; lineNo++) 
    { 
     recordNumber = lineNo + 1; 
     **string[] recordFields = line[lineNo].Split(seperator, StringSplitOptions.RemoveEmptyEntries);** // Showing me error when we split columns 
     if (recordFields.Count() == noOfColumns) 
     { 
     //Do processing 
     } 
+4

另外,Eric Lippert在OutOfMemoryExceptions上有一个很棒的博客。 http://blogs.msdn.com/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx – 2009-09-10 10:17:03

+0

这是在紧凑的框架(即Windows Mobile )? – MusiGenesis 2009-09-10 12:06:42

回答

12

斯普利特执行不力,并有严重的性能问题时,应用于巨大的字符串。请参考this article for details on memory requirements by split function

,当你做对包含的各16个字符1355049逗号分隔字符串的字符串分割,会发生什么,具有25745930总字符长度?

  1. 指针字符串对象的Array:4(地址指针)的连续虚拟地址空间* 1355049 = 5420196(数组大小)+ 16(用于记帐)= 5420212.

  2. 非用于1355049个字符串的连续虚拟地址空间,每个字节为54个字节。这并不意味着所有这130万个字符串都会散布在整个堆中,但它们不会被分配到LOH。 GC将在Gen0堆上分配它们。

  3. Split.Function将创建System.Int32的内部数组[]大小25745930,消耗(102983736个字节)〜LOH,这是非常昂贵L.

1

尝试读取文件linewise而不是拆分整个内容。

10

尝试读取整个文件到一个数组中第一个“reader.ReadToEnd()”读通过直接行的文件行..

using (StreamReader sr = new StreamReader(this._path)) 
     { 
      string line = ""; 
      while((line= sr.ReadLine()) != null) 
      { 
       string[] cells = line.Split(new string[] { "\t" }, StringSplitOptions.None); 
       if (cells.Length > 0) 
       { 

       } 
      } 
     } 
+0

当我们逐行读取 – 2009-09-10 11:26:50

+1

时它会产生影响如果我的所有数据都在一行中,它就不起作用。 – Butzke 2015-11-25 15:15:10

4

我建议的98MB如果可以的话,可以逐行阅读,但有时候用新行分割并不是必需的。

所以你可以随时编写自己的内存高效分割。这解决了我的问题。

private static IEnumerable<string> CustomSplit(string newtext, char splitChar) 
    { 
     var result = new List<string>(); 
     var sb = new StringBuilder(); 
     foreach (var c in newtext) 
     { 
      if (c == splitChar) 
      { 
       if (sb.Length > 0) 
       { 
        result.Add(sb.ToString()); 
        sb.Clear(); 
       } 
       continue; 
      } 
      sb.Append(c); 
     } 
     if (sb.Length > 0) 
     { 
      result.Add(sb.ToString()); 
     } 
     return result; 
    } 
2

我用我自己的。它已经过10次单元测试..

public static class StringExtensions 
{ 

    // the string.Split() method from .NET tend to run out of memory on 80 Mb strings. 
    // this has been reported several places online. 
    // This version is fast and memory efficient and return no empty lines. 
    public static List<string> LowMemSplit(this string s, string seperator) 
    { 
     List<string> list = new List<string>(); 
     int lastPos = 0; 
     int pos = s.IndexOf(seperator); 
     while (pos > -1) 
     { 
      while(pos == lastPos) 
      { 
       lastPos += seperator.Length; 
       pos = s.IndexOf(seperator, lastPos); 
       if (pos == -1) 
        return list; 
      } 

      string tmp = s.Substring(lastPos, pos - lastPos); 
      if(tmp.Trim().Length > 0) 
       list.Add(tmp); 
      lastPos = pos + seperator.Length; 
      pos = s.IndexOf(seperator, lastPos); 
     } 

     if (lastPos < s.Length) 
     { 
      string tmp = s.Substring(lastPos, s.Length - lastPos); 
      if (tmp.Trim().Length > 0) 
       list.Add(tmp); 
     } 

     return list; 
    } 
}