如何在java中查找字符串中的整个单词

我有一个字符串，必须为不同的关键字解析。例如，我有字符串：如何在java中查找字符串中的整个单词

“我会与你相约在123woods”

而且我的关键字

“123woods” “树林”

我应该报告每当我有一场比赛，并在哪里。还应该考虑多次事件。然而，对于这一场比赛，我只能在123伍兹比赛中得到一场比赛，而不是在森林中。这消除了使用String.contains（）方法。此外，我应该可以有一个关键字列表/一组关键字，并同时检查它们的发生。在这个例子中，如果我有'123woods'和'come'，我应该得到两个事件。在大文本上执行方法应该有点快。

我的想法是使用StringTokenizer，但我不确定它是否会表现良好。有什么建议么？

来源

2011-02-23 Nikola Yovchev

你确定逻辑没有缺陷吗？如果您有关键字 - words123和123words，该怎么办？那么在文字中的单词是谁的比赛？ – 2011-02-23 12:48:10

无。我只需要确切的单词匹配。 – 2011-02-23 13:18:15

以下示例基于您的意见。它使用关键字列表，将使用字边界在给定的字符串中进行搜索。它使用Apache Commons Lang中的StringUtils来构建正则表达式并打印匹配的组。

String text = "I will come and meet you at the woods 123woods and all the woods"; 

List<String> tokens = new ArrayList<String>(); 
tokens.add("123woods"); 
tokens.add("woods"); 

String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b"; 
Pattern pattern = Pattern.compile(patternString); 
Matcher matcher = pattern.matcher(text); 

while (matcher.find()) { 
    System.out.println(matcher.group(1)); 
}

如果你正在寻找更多的性能，你可以看看StringSearch：Java中的高性能模式匹配算法。

来源

2011-02-23 12:50:43 Chris

如果我有一个ArrayList 而我想用一个模式来构建它呢？好像我必须使用可靠的旧StringBuilder？ – 2011-02-23 13:06:12

@baba - 你可以这样做，或者你可以迭代List <>。我不确定哪个更有效率，如果性能是一个问题，你可能想尝试两种方法。 – 2011-02-23 13:12:55

我个人更喜欢遍历列表。我的答案中增加了这个选项。 – Chris 2011-02-23 13:30:25

您可以使用正则表达式。使用匹配器和模式方法来获得所需的输出

来源

2011-02-23 12:49:09 Deepak

如何像Arrays.asList(String.split(" ")).contains("xx")？

参见String.split()和How can I test if an array contains a certain value。

来源

2011-02-23 12:50:35

您还可以使用正则表达式匹配与\ B标志（整个单词边界）。

来源

2011-02-23 12:51:21

尝试使用正则表达式进行匹配。匹配“\ b123wood \ b”，\ b是单词分隔符。

来源

2011-02-23 12:51:38 Axel

当别人回答时，使用正则表达式+字边界。

"I will come and meet you at the 123woods".matches(".*\\b123woods\\b.*");

将成立。

"I will come and meet you at the 123woods".matches(".*\\bwoods\\b.*");

将是错误的。

来源

2011-02-23 12:56:34 morja

希望这对你的作品：

String string = "I will come and meet you at the 123woods"; 
String keyword = "123woods"; 

Boolean found = Arrays.asList(string.split(" ")).contains(keyword); 
if(found){ 
     System.out.println("Keyword matched the string"); 
}

http://codigounico.blogspot.com/

来源

2011-02-23 14:02:15 LeonardoPolitec

一个更简单的方式做到这一点是使用分裂（）：

String match = "123woods"; 
String text = "I will come and meet you at the 123woods"; 

String[] sentence = text.split(); 
for(String word: sentence) 
{ 
    if(word.equals(match)) 
     return true; 
} 
return false;

这是一个更简单，更优雅不使用代币等做同样的事情的方法等。

来源

2012-10-11 00:12:48 ulu5

虽然比较容易理解和写，但这并不是我问的问题的答案。我有两个或三个，或者可能是无限数量的“匹配”关键字，我需要获取在“文本”中找到的那些关键字。当然，你可能会在分割文本上为每个“单词”循环我的“匹配”关键字，但是我发现它比已经接受的解决方案更不优雅。 – 2012-10-11 07:55:18

为了匹配“123woods”而不是在“森林”中，在正则表达式中使用原子分组。有一点需要注意的是，在一个匹配“123woods”的字符串中，它将匹配第一个“123woods”并退出，而不是进一步搜索相同的字符串。

\b(?>123woods|woods)\b

它搜索123woods作为主搜索，一旦匹配它退出搜索。

来源

2013-08-31 13:00:55 SasiRSK

在Android中得到了一个办法比赛确切的词从字符串：

String full = "Hello World. How are you ?"; 

String one = "Hell"; 
String two = "Hello"; 
String three = "are"; 
String four = "ar"; 


boolean is1 = isContainExactWord(full, one); 
boolean is2 = isContainExactWord(full, two); 
boolean is3 = isContainExactWord(full, three); 
boolean is4 = isContainExactWord(full, four); 

Log.i("Contains Result", is1+"-"+is2+"-"+is3+"-"+is4); 

Result: false-true-true-false

的匹配词功能：

private boolean isContainExactWord(String fullString, String partWord){ 
    String pattern = "\\b"+partWord+"\\b"; 
    Pattern p=Pattern.compile(pattern); 
    Matcher m=p.matcher(fullString); 
    return m.find(); 
}

完成

来源

2015-07-07 10:51:42

回首在原来的问题，我们需要在给定的句子中找到一些给定的关键字，计算出现次数并知道在哪里。我不太明白“where”是什么意思（这是句中的索引吗？），所以我会通过那个...我仍然在学习java，一次一步，所以我会看到在适当的时间:-)

必须注意，常见的句子（作为原问题中的一个）可以有重复的关键字，因此，搜索不能只是问一个给定的关键字是否存在和如果它存在，则将其计为1。可以有更多的相同。例如：

// Base sentence (added punctuation, to make it more interesting): 
String sentence = "Say that 123 of us will come by and meet you, " 
       + "say, at the woods of 123woods."; 

// Split it (punctuation taken in consideration, as well): 
java.util.List<String> strings = 
         java.util.Arrays.asList(sentence.split(" |,|\\.")); 

// My keywords: 
java.util.ArrayList<String> keywords = new java.util.ArrayList<>(); 
keywords.add("123woods"); 
keywords.add("come"); 
keywords.add("you"); 
keywords.add("say");

通过观察它，预期的结果将是5“说” +“来” +“你” +“表示” +“123woods”计数“说”两次，如果我们去小写。如果我们不这样做，那么计数应该是4，“说”被排除在外并且“说”包括在内。精细。我的建议是：

// Set... ready...? 
int counter = 0; 

// Go! 
for(String s : strings) 
{ 
    // Asking if the sentence exists in the keywords, not the other 
    // around, to find repeated keywords in the sentence. 
    Boolean found = keywords.contains(s.toLowerCase()); 
    if(found) 
    { 
     counter ++; 
     System.out.println("Found: " + s); 
    } 
} 

// Statistics: 
if (counter > 0) 
{ 
    System.out.println("In sentence: " + sentence + "\n" 
        + "Count: " + counter); 
}

而且结果是：

发现：说
发现：来
发现：你
发现：说
发现：123woods
在一句：喂我们中的123人会在123woods的树林里过来见你。
次数：5

来源

2015-07-13 23:54:14

的解决方案似乎是早就接受了，但解决的办法可以改善，因此，如果有人有类似的问题：

这是多模式 - 搜索 - 一个经典的应用算法。

Java模式搜索（与Matcher.find）没有资格这样做。在java中优化搜索恰好一个关键字，搜索or表达式使用正则表达式非确定性自动机，它是在不匹配时回溯的。在更糟糕的情况下，文本的每个字符将被处理l次（其中l是模式长度的总和）。

单一模式搜索更好，但不合格。人们必须开始搜索每个关键字模式。在更糟的情况下，文本中的每个字符将被处理p次，其中p是模式的数量。

多模式搜索将会精确处理文本的每个字符一次。适合这种搜索的算法将是Aho-Corasick，Wu-Manber或Set Backwards Oracle Matching。这些可以在像Stringsearchalgorithms或byteseek这样的库中找到。

// example with StringSearchAlgorithms 

AhoCorasick stringSearch = new AhoCorasick(asList("123woods", "woods")); 

CharProvider text = new StringCharProvider("I will come and meet you at the woods 123woods and all the woods", 0); 

StringFinder finder = stringSearch.createFinder(text); 

List<StringMatch> all = finder.findAll();

来源

2016-08-13 10:22:39 CoronA

如何在java中查找字符串中的整个单词

回答

相关问题