2013-02-25 74 views
0

我目前正在做一个小的C#练习,处理在文本文件中搜索相关的术语/单词,程序将写出包含搜索到的单词的文本文件中的所有句子。例如,我输入单词“example”,程序将执行的操作是遍历文本文件中的所有句子,并提取那些在其中包含单词“example”的句子。字符串线性搜索c#

The text file is structured as so: <sentenceDesignator> <text> 
sentence 1: bla bla bla bla example of a sentence //each line contains a sentence 
sentence 2: this is not a good example of grammar 
sentence 3: bla is not a real word, use better terms 

我想做些什么可以做的是使用线性搜索要经过所有的线路中的文本文件,并写入了包含搜索的字符串项的所有句子。

我迄今为止代码:

 String filename = @"sentences.txt"; 

     if (!File.Exists(filename)) 
     { 
      // Since we just created the file, this shouldn't happen. 
      Console.WriteLine("{0} not found", filename); 
      return; 
     } 
     else 
     { 
      Console.WriteLine("Successfully found {0}.", filename); 
     } 
     //making a listof type "Sentence" to hold all the sentences 
     List<Sentence> sentences = new List<Sentence>(); 

     //the next lines of code... 
     StreamReader reader = File.OpenText(filename); 

     //first, write out all of the sentences in the text file 

     //read a line(sentence) from a line in the text file 
     string line = reader.ReadLine(); 

     while (line != null) 
     { 
      Sentence s = new Sentence(); 

      //we need something to split data... 
      string[] lineArray = line.Split(':'); 

      s.sentenceDesignator = lineArray[0]; 
      s.Text = lineArray[1]; 

      Console.Write("\n{0}", line); 

      line = reader.ReadLine(); 
     } 

     //so far, we can write out all of the sentences in the text file. 
     Console.Write("\n\nOK!, search a term to diplay all their occurences: "); 
     string searchTerm = Console.ReadLine(); 

     if(!line.Contains(searchterm)) 
     { 
      Console.Write("\nThat term does not exist in any sentence."); 
     } 
     else 
     { 
      foreach (Sentence ss in sentences) 
      { 
       if (ss.sentenceDesignator.Contains(queryName)) 
       { 
        //I need help here 
       } 
      } 
     } 
+3

那么什么似乎是问题吗? – AgentFire 2013-02-25 06:34:49

回答

1

如果你构建的文件的索引,然后搜索的索引,与线性搜索每次搜索操作O(n),而与索引这将是快了不少搜索O(n)用于构建索引,但O(log n)near-O(1)用于查找(取决于您如何构建索引)。成本增加的内存消耗为指标,但我不喜欢这样写道:

private Dictionary<String,List<Int32>> _index = new Dictionary<String,List<Int32>>(); 

/// <summary>Populates an index of words in a text file. Takes O(n) where n is the size of the input text file.</summary> 
public void BuildIndex(String fileName) { 

    using(Stream inputTextFile = OpenFile(...)) { 

     int currentPosition = 0; 
     foreach(String word in GetWords(inputTextFile)) { 

      word = word.ToUpperInvariant(); 
      if(!_index.ContainsKey(word)) _index.Add(word, new List<Int32>()); 
      _index[word].Add(currentPosition); 

      currentPosition = inputTextFile.Position; 
     } 
    } 
} 

/// <summary>Searches the text file (via its index) if the specified string (in its entirety) exists in the document. If so, it returns the position in the document where the string starts. Otherwise it returns -1. Lookup time is O(1) on the size of the input text file, and O(n) for the length of the query string.</summary> 
public Int32 SearchIndex(String query) { 

    String[] terms = query.Split(' '); 

    Int32 startingPosition = -1; 
    Int32 currentPosition = -1; 
    Boolean first = true; 
    foreach(String term in terms) { 
     term = term.ToUpperInvariant(); 

     if(first) { 
      if(!_index.Contains(term)) return -1; 
      startingPosition = _index[term][0]; 
     } else { 

      if(!ContainsTerm(term, ++currentPosition)) return -1; 
     } 

     first = false; 
    } 

    return startingPosition; 
} 

/// <summary>Indicates if the specified term exists at the specified position.</summary> 
private Boolean ContainsTerm(String term, Int32 expectedPosition) { 

    if(!_index.ContainsKey(term)) return false; 
    List<Int32> positions = _index[term]; 
    foreach(Int32 pos in positions) { 

     if(pos == expectedPosition) return true; 
    } 
    return false; 
} 

OpenFileGetWords的实施应该是微不足道的。请注意,GetWords使用yield return在文件中构建空格分隔的单词IEnumerable<String>,并处理您的自定义文件格式。

+0

我不想知道字符串的位置。我想搜索搜索到的术语的所有实例,然后写出包含该术语的所有句子。 – 2013-02-25 07:20:56

+0

修改算法搜索所有实例是一个留给读者的问题:)我故意在我的答案中没有提供确切的解决方案。 – Dai 2013-02-25 07:23:38

0

我对最后一个if/else有点困惑。看起来你只是比较文件的最后一行和searchterm。另外,“queryName”从哪里来?你想打印出整个句子(“bla bla bla例子的一个句子”)或只是“句子1”?此外,你检查句子设计器是否包含queryName,我想你想检查实际的文本是否包含searchterm。

也许这将帮助你:

var lines = File.ReadAllLines(fileName);  
var sentences = new List<Sentence>(lines.Count()); 

foreach (var line in lines) 
{ 
    var lineArray = line.Split(':'); 
    sentences.Add(new Sentence { sentenceDesignator = lineArray[0], Text = lineArray[1]}); 
} 

foreach (var sentence in sentences) 
{ 
    if (sentence.Text.Contains(searchTerm)) 
    { 
     Console.WriteLine(sentence.sentenceDesignator); 
     //Console.WriteLine(sentence.Text); 
    } 
}