如何逐字逐字在C＃中的字符串？

15

foreach (string word in "incidentno and fintype or unitno".Split(' ')) { 
    ... 
}

来源

2009-09-18 07:04:41 Guffa

+0

我不知道这件事，但我认为这会造成在每次迭代的分裂。你宁愿将字符串拆分并放入本地数组，然后使用“in”运算符。 – synhershko 2009-09-18 07:27:07

+3

@synhershko：不，它只会分裂一次。 – Guffa 2009-09-18 07:34:54

+0

唯一的问题是标点符号'foreach（字符串字在“现在，结束就近了”.Split（''））' – 2009-09-18 08:30:08

3

假设的话总是用空格隔开，你可以使用String.Split()让你的单词的数组。

来源

2009-09-18 07:05:53 bbohac

3

使用String类

string[] words = "incidentno and fintype or unitno".Split(" ");

这的拆分方法拆分的空间，让“字”将有[incidentno,and,fintype,or,unitno]。

来源

2009-09-18 07:06:49

12

var regex = new Regex(@"\b[\s,\.-:;]*"); 
var phrase = "incidentno and fintype or unitno"; 
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

这工作，即使你有你的话之间“.,; tabs and new lines”。

来源

2009-09-18 07:09:30

+1

如果使用Split的重载，则不需要“.Where”可以添加StringSplitOptions.RemoveEmptyEntries。 – 2009-09-18 08:01:22

+1

没有这样的方法。我正在使用Regex.Split而不是String.Split – 2009-09-18 10:23:37

+0

在我看来，这是最好的答案，但是有一个错误。在标点字符中，您需要跳过连字符，否则将其定义为范围。所以第一行应该是'var regex = new Regex（@“\ b [\ s，\。\ - :;] *”）;' – Anduril 2017-04-27 09:26:39

11

稍微扭曲我知道，但你可以定义一个迭代器块作为字符串的扩展方法。例如

/// <summary> 
    /// Sweep over text 
    /// </summary> 
    /// <param name="Text"></param> 
    /// <returns></returns> 
    public static IEnumerable<string> WordList(this string Text) 
    { 
     int cIndex = 0; 
     int nIndex; 
     while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1) 
     { 
      int sIndex = (cIndex == 0 ? 0 : cIndex + 1); 
      yield return Text.Substring(sIndex, nIndex - sIndex); 
      cIndex = nIndex; 
     } 
     yield return Text.Substring(cIndex + 1); 
    } 

     foreach (string word in "incidentno and fintype or unitno".WordList()) 
      System.Console.WriteLine("'" + word + "'");

其中的优点是不会为长字符串创建大数组。

来源

2009-09-18 07:20:25 JDunkerley

+2

我喜欢这个选择，对于大量数据非常有用，你真的该值得+1！ – jdehaan 2009-09-18 07:22:49

+0

是的，我也是+1！ – Wayne 2009-09-18 08:17:51

1

当使用拆分时，检查空项是什么？

string sentence = "incidentno and fintype or unitno" 
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries); 
foreach (string word in words) 
{ 
// Process 
}

编辑：

我不能评论，所以我在这里发帖，但这（上面贴）工作原理：

foreach (string word in "incidentno and fintype or unitno".Split(' ')) 
{ 
    ... 
}

我的foreach的理解是，它首先进行的GetEnumerator （）和calles.MoveNext直到返回false。所以.Split不会在每次迭代中重新评估。

来源

2009-09-18 07:49:34 ParmesanCodice

2

有多种方法可以完成此操作。最方便的方法（在我看来）两个是：

使用string.Split（）创建一个数组。我可能会使用这种方法，因为它是最明显的。

例如：

string startingSentence = "incidentno and fintype or unitno"; 
string[] seperatedWords = startingSentence.Split(' ');

或者，您可以使用（这是我会用什么）：

string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);

StringSplitOptions.RemoveEmptyEntries将删除你的数组任何空项由于可能出现的额外的空白和其他小问题。

下一页 - 来处理的话，你可以使用：

foreach (string word in seperatedWords) 
{ 
//Do something 
}

或者，您可以使用正则表达式来解决这个问题，为Darin demonstrated（副本如下）。

例如：

var regex = new Regex(@"\b[\s,\.-:;]*"); 
var phrase = "incidentno and fintype or unitno"; 
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));

进行处理，您可以使用类似的代码的第一个选项。

foreach (string word in words) 
{ 
//Do something 
}

当然，也有许多办法来解决这个问题，但我认为这两个是实现和维护最简单的。我会选择第一个选项（使用string.Split（）），因为正则表达式有时会变得非常混乱，而分割将在大多数情况下正常运行。

来源

2009-09-18 08:16:59

-1

我写了一个字符串处理器类，你可以使用它。

实施例：

metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();

类别：

public static class StringProcessor 
{ 
    private static List<String> PrepositionList; 

    public static string ToNormalString(this string strText) 
    { 
     if (String.IsNullOrEmpty(strText)) return String.Empty; 
     char chNormalKaf = (char)1603; 
     char chNormalYah = (char)1610; 
     char chNonNormalKaf = (char)1705; 
     char chNonNormalYah = (char)1740; 
     string result = strText.Replace(chNonNormalKaf, chNormalKaf); 
     result = result.Replace(chNonNormalYah, chNormalYah); 
     return result; 
    } 

    public static List<KeyValuePair<String, Int32>> Process(this String bodyText, 
     List<String> blackListWords = null, 
     int minimumWordLength = 3, 
     char splitor = ' ', 
     bool perWordIsLowerCase = true) 
    { 
     string[] btArray = bodyText.ToNormalString().Split(splitor); 
     long numberOfWords = btArray.LongLength; 
     Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1); 
     foreach (string word in btArray) 
     { 
      if (word != null) 
      { 
       string lowerWord = word; 
       if (perWordIsLowerCase) 
        lowerWord = word.ToLower(); 
       var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "") 
        .Replace("?", "").Replace("!", "").Replace(",", "") 
        .Replace("<br>", "").Replace(":", "").Replace(";", "") 
        .Replace("،", "").Replace("-", "").Replace("\n", "").Trim(); 
       if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords))) 
       { 
        if (wordsDic.ContainsKey(normalWord)) 
        { 
         var cnt = wordsDic[normalWord]; 
         wordsDic[normalWord] = ++cnt; 
        } 
        else 
        { 
         wordsDic.Add(normalWord, 1); 
        } 
       } 
      } 
     } 
     List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList(); 
     return keywords; 
    } 

    public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true) 
    { 
     List<KeyValuePair<String, Int32>> result = null; 
     if (isBasedOnFrequency) 
      result = list.OrderByDescending(q => q.Value).ToList(); 
     else 
      result = list.OrderByDescending(q => q.Key).ToList(); 
     return result; 
    } 

    public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10) 
    { 
     List<KeyValuePair<String, Int32>> result = list.Take(n).ToList(); 
     return result; 
    } 

    public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list) 
    { 
     List<String> result = new List<String>(); 
     foreach (var item in list) 
     { 
      result.Add(item.Key); 
     } 
     return result; 
    } 

    public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list) 
    { 
     List<Int32> result = new List<Int32>(); 
     foreach (var item in list) 
     { 
      result.Add(item.Value); 
     } 
     return result; 
    } 

    public static String AsString<T>(this List<T> list, string seprator = ", ") 
    { 
     String result = string.Empty; 
     foreach (var item in list) 
     { 
      result += string.Format("{0}{1}", item, seprator); 
     } 
     return result; 
    } 

    private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords) 
    { 
     bool result = false; 
     if (blackListWords == null) return false; 
     foreach (var w in blackListWords) 
     { 
      if (w.ToNormalString().Equals(word)) 
      { 
       result = true; 
       break; 
      } 
     } 
     return result; 
    } 
}

来源

2013-03-12 12:02:20 Jahan

0

public static string[] MyTest(string inword, string regstr) 
{ 
    var regex = new Regex(regstr); 
    var phrase = "incidentno and fintype or unitno"; 
    var words = regex.Split(phrase); 
    return words; 
}

？ MyTest的（ “incidentno和.fintype-;或：unitno”，@ “[^ \ w +]”）

[0]: "incidentno" 
[1]: "and" 
[2]: "fintype" 
[3]: "or" 
[4]: "unitno"

来源

2013-11-15 08:20:02

+0

嗯..这是一个答案？ – kleopatra 2013-11-15 08:40:49

0

我想一些信息添加到JDunkerley的awnser。
如果您提供字符串或字符参数进行搜索，则可以轻松使此方法更可靠。

public static IEnumerable<string> WordList(this string Text,string Word) 
     { 
      int cIndex = 0; 
      int nIndex; 
      while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1) 
      { 
       int sIndex = (cIndex == 0 ? 0 : cIndex + 1); 
       yield return Text.Substring(sIndex, nIndex - sIndex); 
       cIndex = nIndex; 
      } 
      yield return Text.Substring(cIndex + 1); 
     } 

public static IEnumerable<string> WordList(this string Text, char c) 
     { 
      int cIndex = 0; 
      int nIndex; 
      while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1) 
      { 
       int sIndex = (cIndex == 0 ? 0 : cIndex + 1); 
       yield return Text.Substring(sIndex, nIndex - sIndex); 
       cIndex = nIndex; 
      } 
      yield return Text.Substring(cIndex + 1); 
     }

来源

2013-12-10 13:04:14

如何逐字逐字在C＃中的字符串？

回答

相关问题