通过使用C解析文本从文本文件中提取文本＃

在这些数据中，我有我想要提取的电话号码并放入新的文本文件。

文件中的数字是我所关心的。

我想知道是否有一种方法在C＃或VB来做到这一点？

我知道IBM有一个名为Omnifind的软件包来执行数据分析，但是想要编写一个只执行上述主题的应用程序。

P.S.该数据的一个例子 -

John Smith London 123456 
Hayley Smith Manchester 234567 
Mike Smith Birmingham 345678

所以我想创建一个具有只是一个新的文件 -

123456 
234567 
345678

来源

2011-04-08 Ebikeneser

可以提供非结构化数据文件的一部分 – Gabriel 2011-04-08 10:21:58

我现在编辑了问题以包含示例数据。 – Ebikeneser 2011-04-08 10:30:40

试试这个

using System.IO; 
using System.Text.RegularExpressions; 
public List<string> NaiveExtractor(string path) 
{ 
    return 
    File.ReadAllLines(path) 
     .Select(l => Regex.Replace(l, @"[^\d]", "")) 
     .Where(s => s.Length > 0) 
     .ToList(); 
}

顾名思义，它太天真了，它也会提取姓名中的数字，如果一行有两个电话号码，他们会一起陷入困境。

来源

2011-04-08 10:54:55 Benjol

没有运气 - 有没有这样的方法。我建议类似的东西 -

List<string> result = new List<string>(); 
     using(StreamReader content = File.OpenText("text")) 
     { 
     while(!content.EndOfStream) 
     { 
      string line = content.ReadLine(); 
      var substrings = line.Split(' '); 
      result.Add(substrings[substrings.Length-1]); 
     } 
     }

来源

2011-04-08 10:47:04 Unknown

感谢您的输入 – Ebikeneser 2011-04-08 11:29:21

嗯，你可以使用类似regular expressions或在这种情况下，你很可能只是做一些基本的字符串操作：

using (StreamReader reader = new StreamReader("infile.txt")) 
{ 
    using (StreamWriter writer = new StreamWriter("outfile.txt")) 
    { 
     string line; 
     while ((line = reader.ReadLine()) != null) 
     { 
      int index = line.LastIndexOf(' '); 
      if (index > 0 && index + 1 < line.Length) 
      { 
       writer.WriteLine(line.Substring(index + 1)); 
      } 
     } 
    } 
}

来源

2011-04-08 10:51:18 Justin

通过使用C解析文本从文本文件中提取文本＃

回答

相关问题