返回文本中给定位置前后的指定字数

我使用以下代码时遇到了大问题。我希望它会在找到的关键字（针）前后返回n个单词，但它永远不会。返回文本中给定位置前后的指定字数

如果我有一文，说

"There is a lot of interesting stuff going on, when someone tries to find the needle in the haystack. Especially if there is anything to see blah blah blah".

而且我有这样的正则表达式：

"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\b)needle(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"

如果这不完全是给定字符串中匹配针，并返回文本

someone tries to find the needle in the haystack. Especially if

它从来没有:-(在执行，我的方法总是返回一个空字符串，但我绝对知道，该关键字在给定的文本内。

private String trimStringAtWordBoundary(String haystack, int wordsBefore, int wordsAfter, String needle) { 
    if(haystack == null || haystack.trim().isEmpty()){ 
     return haystack ; 
    } 

    String textsegments = ""; 

    String patternString = "((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,"+wordsBefore+"}\b)" + needle + "(\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,"+wordsAfter+"})"; 


    Pattern pattern = Pattern.compile(patternString); 
    Matcher matcher = pattern.matcher(haystack); 

    logger.trace(">>> using regular expression: " + matcher.toString()); 

    while(matcher.find()){ 
     logger.trace(">>> found you between " + matcher.regionStart() + " and " + matcher.regionEnd()); 
     String segText = matcher.group(0); // as well tried it with group(1) 
     textsegments += segText + "..."; 
    } 

    return textsegments; 
}

很明显，问题在于我的正则表达式，但我无法弄清楚它有什么问题。

来源

2014-09-30 siliconchris

它看起来并不像你表达内计提空白字符，通常你会使用'\ s'在你有'\ b'的地方，也存在于它之前/之后的字符类中......类似于'“（（？：[\ w'\ .-] + \ s）{0，”+ wordsBefore + “}）”'和后面的类似... – abiessu 2014-09-30 20:28:44

你的正则表达式基本上是好的，但在Java中，你需要躲避\b：

"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}\\b)needle(\\b(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5})"

来源

2014-09-30 20:28:59 wvdz

也许我错过了一些东西，但是'\\ b'实际上是否占空白？我认为还有一个'\\ s'礼物... – abiessu 2014-09-30 20:34:54

\ b是单词边界元字符，所以它比空格稍多一点。 – wvdz 2014-09-30 20:36:33

好的，但是在词语之间的每一个分隔处都不会有两个边界吗？ '\\ b'实际上并不匹配两个单词之间的所有可能的空白，因为它被指定为“零宽度匹配”？ – abiessu 2014-09-30 20:41:07

返回文本中给定位置前后的指定字数

回答

相关问题