2012-02-20 55 views
2

我有以下代码,它加载速度很慢第一次。 CSV文件约为4mb 16000行。如何提高在VB.net中创建DataTable的性能?

 If Session("tb") Is Nothing Then 
      Dim str As String() 
      If (IsNothing(Cache("csvdata"))) Then 
       str = File.ReadAllLines(Server.MapPath("~/test/feed.csv")) 
       Cache.Insert("csvdata", str, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero) 
      Else 
       str = CType(Cache("csvdata"), Array) 
      End If 
      Dim dt As New DataTable 
      dt.Columns.Add("Shape", GetType(System.String)) 
      dt.Columns.Add("Weight", GetType(System.Double)) 
      dt.Columns.Add("Color", GetType(System.String)) 
      dt.Columns.Add("Clarity", GetType(System.String)) 
      dt.Columns.Add("Price", GetType(System.Int32)) 
      dt.Columns.Add("CutGrade", GetType(System.String)) 

      For i As Integer = 1 To str.Length - 1 
       Dim pattern As String = ",(?=([^""]*""[^""]*"")*[^""]*$)" 
       Dim rgx As New Regex(pattern) 
       Dim t As String = rgx.Replace(str(i), "\") 
       Dim s As String() = t.Split("\"c) 
       Dim pr As Int32 = CType(s(5), Int32) 
       Dim fpr As Int32 
       Dim rate As Double 
       Select Case pr 
        Case Is < 300 
         rate = 2 
        Case 301 To 600 
         rate = 1.7 
        Case Is > 600 
         rate = 1.16 
       End Select 
       fpr = Math.Round(pr * rate) 
       Dim a As String() = {s(1), s(2), s(3), s(4), fpr, s(40)} 
       dt.Rows.Add(a) 
      Next 

      Session("tb") = dt 
      ListView1.DataSource = dt 
      ListView1.DataBind() 
     Else 
      Dim x As DataTable = CType(Session("tb"), DataTable) 
      ListView1.DataSource = x 
      ListView1.DataBind() 
     End If 

csv文件被缓存,我认为这可以与大家分享。 (一个人在12小时内加载一次) 一旦我创建了会话,页面加载也很快。 因此,创建Datatable似乎是一个缓慢的过程。 这是第一次处理数据表,我敢肯定有人可以指出我做错了什么。

谢谢

UPDATE:

我已经改变缓存到原始数据表,而不是CSV文件。 它现在快速加载,但我想知道这是不是一个坏主意。

Cache.Insert("csvdata", dt, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero) 

将它存储在缓存中后,我可以使用Linq对它运行Query。

示例CSV第3行

Supplier ID,Shape,Weight,Color,Clarity,Price/Carat,Lot Number,Stock Number,Lab,Cert #,Certificate Image,2nd Image,Dimension,Depth %,Table %,Crown Angle,Crown %,Pavilion Angle,Pavilion %,Girdle Thinnest,Girdle Thickest,Girdle %,Culet Size,Culet Condition,Polish,Symmetry,Fluor Color,Fluor Intensity,Enhancements,Remarks,Availability,Is Active,FC-Main Body,FC- Intensity,FC- Overtone,Matched Pair,Separable,Matching Stock #,Pavilion,Syndication,Cut Grade,External Url 
9349,Round,1.74,F,VVS1,13650.00,,IM-95-188-243,ABC,11228,,,7.81|7.85|4.62,59.00,62.00,34.00,13.00,,,Medium,,0,None,,Excellent,Very Good,Blue,Medium,,"",Not Specified,Y,,,,False,True,,,,Very Good,http://www.test/teste. 
9949,Round,1.00,I,VVS1,6059.00,,IM-95-189-C021,ABC,212197,,,6.37|6.42|3.96,61.90,54.00,34.50,16.00,,,Thin,Slightly Thick,0,None,,Excellent,Good,,None,,"Additional pinpoints are not shown.",Guaranteed Available,Y,,,,False,True,,,,Very Good,http://www.test/test. 
+2

“缓慢?” – 2012-02-20 22:37:46

+0

加载第一次需要大约7-8秒我的本地测试服务器是10克RAM四Xeon 1.86ghz – shinya 2012-02-20 23:28:55

回答

0

考虑使用一个TextFieldParser读取CSV,而不是分裂自己的字符串。另外,如果使用CustomClass具有Shape,Weight,Color等属性的List(Of CustomClass),则可以避免DataTable的不必要开销,并且仍然可以对List执行LINQ查询。

请原谅我的C#,我没有在这个盒子上安装VB.NET。

public class Gemstone 
    { 
     public string Shape { get; set; } 
     public double Weight { get; set; } 
     public string Color { get; set; } 
    } 

    static void Main(string[] args) 
    { 
     TextFieldParser textFieldParser = new TextFieldParser("data.txt"); 
     textFieldParser.Delimiters = new string[] {","}; 
     textFieldParser.ReadLine(); // skip header line 
     List<Gemstone> list = new List<Gemstone>(16000); // allocate the list with your best calculated guess of its final size 
     while(!textFieldParser.EndOfData) 
     { 
      string[] fields = textFieldParser.ReadFields(); 
      Gemstone gemstone = new Gemstone(); 
      gemstone.Shape = fields[1]; 
      gemstone.Weight = Double.Parse(fields[2]); 
      gemstone.Color = fields[3]; 
      list.Add(gemstone); 
     } 
+0

我'现在尝试TextFieldParser ...我如何摆脱Feild Name?我似乎无法摆脱它的结果。我试过Dim i = 0虽然不是我的Reader.EndOfData如果我= 1那么“处理该行“else i = 1结束如果结束虽然但仍然解析第一行... – shinya 2012-02-21 19:19:26

+0

@shinya你能发布一行你想使用的示例csv数据吗? – 2012-02-22 01:40:43

+0

我在问题部分发布了示例csv数据 – shinya 2012-02-23 19:25:35

0

FYI我刚刚发现这整个TextFieldParser的事情,我做的文本文件中的很多分析,所以我测试了它....

在一个11MB的文件,大约有5200行和300列。

这是我在使用数据表时速度的25%。这是速度的15%左右,当我删除了数据表代码:

 Dim DataTable As New DataTable() 
    Dim StartTime As Long = Now.Ticks 
    Dim Reader As New FileIO.TextFieldParser("file.txt") 
    Reader.TextFieldType = FileIO.FieldType.Delimited 
    Reader.SetDelimiters(vbTab) 
    Reader.HasFieldsEnclosedInQuotes = False 
    Dim Header As Boolean = True 
    While Not Reader.EndOfData 
     Dim Fields() As String = Reader.ReadFields 
     If Header Then 
      For I As Integer = 1 To 320 
       DataTable.Columns.Add("Col" & I) 
      Next 
      Header = False 
     Else 
      If Mid(Fields(0), 1, 1) <> "#" Then DataTable.Rows.Add(Fields) 
     End If 
    End While 
    Debug.Print((Now.Ticks - StartTime)/10000 & "ms") 

    Dim DataTable2 As New DataTable() 
    StartTime = Now.Ticks 
    For I As Integer = 1 To 320 
     DataTable2.Columns.Add("Col" & I) 
    Next 
    For Each line As String In System.IO.File.ReadAllLines("file.txt") 
     Dim NVP() As String = Split(line, vbTab) 
     If Mid(line, 1, 1) <> "#" Then DataTable2.Rows.Add(NVP) 
    Next 
    Debug.Print((Now.Ticks - StartTime)/10000 & "ms") 

随着确定年代的代码删除:

​​3210

均田令我感到诧异,但我猜表具有更多的功能。我发现另一个新的东西,我永远不会使用:(