重叠的规则与命名组

我遇到与解析自定义的电话号码正则表达式的问题：重叠的规则与命名组

值匹配“wtvCode”组是可选的;
匹配“countryCode”组的值是可选的;
对于某些值，countryCode规则与areaCityCode规则重叠。在这种情况下，当countryCode缺失时，其表达式会捕获areaCityCode值。

代码示例如下。

Regex regex = new Regex(string.Concat(
    "^(", 
    "(?<wtvCode>[A-Z]{3}|)", 
    "([-|/|#| |]|)", 
    "(?<countryCode>[2-9+]{2,5}|)", 
    "([-|/|#| |]|)", 
    "(?<areaCityCode>[0-9]{2,3}|)", 
    "([-|/|#| |]|))", 
    "(?<phoneNumber>(([0-9]{8,18})|([0-9]{3,4}([-|/|#| |]|)[0-9]{4})|([0-9]{4}([-|/|#| |]|)[0-9]{4})|([0-9]{4}([-|/|#| |]|)[0-9]{4}([-|/|#| |]|)[0-9]{1,5})))", 
    "([-|/|#| |]|)", 
    "(?<foo>((A)|(B)))", 
    "([-|/|#| |]|)", 
    "(?<bar>(([1-9]{1,2})|)", 
    ")$" 
)); 

string[] validNumbers = new[] { 
    "11-1234-5678-27-A-2", // missing wtvCode and countryCode 
    "48-1234-5678-27-A-2", // missing wtvCode and countryCode 
    "55-48-1234-5678-27-A-2" // missing wtvCode 
}; 

foreach (string number in validNumbers) { 
    Console.WriteLine("countryCode: {0}", regex.Match(number).Groups["countryCode"].Value); 
    Console.WriteLine("areaCityCode: {0}", regex.Match(number).Groups["areaCityCode"].Value); 
    Console.WriteLine("phoneNumber: {0}", regex.Match(number).Groups["phoneNumber"].Value); 
}

的输出是：

// First number 
// countryCode:    <- correct 
// areaCityCode: 11   <- correct, but that's because "11" is never a countryCode 
// phoneNumber: 1234-5678-27 <- correct 

// Second number 
// countryCode: 48   <- wrong, should be "" 
// areaCityCode:    <- wrong, should be "48" 
// phoneNumber: 1234-5678-27 <- correct 

// Third number 
// countryCode: 55   <- correct 
// areaCityCode: 48   <- correct 
// phoneNumber: 1234-5678-27 <- correct

我至今未能在一个固定的方式正则表达式，它涵盖了我所有的约束和不乱用COUNTRYCODE和areaCityCode时一个值符合两个规则。有任何想法吗？

在此先感谢。

更新

的电话国家代码正确的正则表达式可以在这里找到：https://stackoverflow.com/a/6967885/136381

来源

2012-03-30 Hilton Perantunes

“55-48-1234-5678-27-A-2”// missing countryCode - > missing wtvCode code？ – zishe 2012-03-30 04:14:32

首先，我建议使用?量词把事情可选，而不是空的替代你”现在重新使用。在国家代码的情况下，添加另一个?以使其非贪婪。这样它会尝试最初捕获areaCityCode组中的第一批数字。只有在总体匹配失败的情况下，它才会返回并使用countryCode组。

Regex regex = new Regex(
    @"^ 
    ((?<wtvCode>[A-Z]{3}) [-/# ])? 
    ((?<countryCode>[2-9+]{2,5}) [-/# ])?? 
    ((?<areaCityCode>[0-9]{2,3}) [-/# ])? 
    (?<phoneNumber> [0-9]{8,18} | [0-9]{3,4}[-/# ][0-9]{4}([-/# ][0-9]{1,5})?) 
    ([-/# ] (?<foo>A|B)) 
    ([-/# ] (?<bar>[1-9]{1,2}))? 
    $", 
    RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);

正如你所看到的，我做了一些其他修改了代码，最重要的是从([-|/|#| |]|)到[-/# ]开关。括号内的管道只与|相符，我敢肯定你不想要。最后一个管道使分隔符可选;我希望他们不要真的必须是可选的，因为这会使这项工作更加困难。

来源

2012-03-30 07:52:49

你的表情看起来不错，以及你的建议。我正在测试它。 – 2012-03-30 13:39:53

你对“（[ - |/|＃| | | | |）”）是正确的。我正在使用您的分隔符模式。 – 2012-03-30 15:16:33

自己和其他响应者忽略了两件事。

首先是反向工作更有意义，换句话说，从右到左，因为文本末尾有比开始时更多的必填字段。通过消除WTV和国家代码的疑问，正则表达式解析器的工作变得更加容易（，尽管在编写模式的人的智力上更难）。

第二个是在正则表达式（？（）|（））中使用if条件。这使我们能够测试一个场景并实现一个匹配模式。我在我的博客上描述如果条件为Regular Expressions and the If Conditional。下面的模式测试了WTV是否有WTV &国家，如果是的话它是否与之匹配，如果不是则检查可选国家。

此外而不将为什么不使用IgnorePatternWhitespace的一种花纹的评论，因为我出现如下格局：

string pattern = @" 
^ 
(?([A-Z][^\d]?\d{2,5}(?:[^\d])) # If WTV & Country Code (CC) 
    (?<wtvCode>[A-Z]{3})   # Get WTV & CC 
    (?:[^\d]?) 
    (?<countryCode>\d{2,5}) 
    (?:[^\d])     # Required Break 
    |       # else maybe a CC 
    (?<countryCode>\d{2,5})?  # Optional CC 
    (?:[^\d]?)     # Optional Break 
) 
(?<areaCityCode>\d\d\d?)  # Required area city 
(?:[^\d]?)      # Optional break (OB) 
(?<PhoneStart>\d{4})   # Default Phone # begins 
(?:[^\d]?)      # OB 
(?<PhoneMiddle>\d{4})   # Middle 
(?:[^\d]?)      # OB 
(?<PhoneEnd>\d\d)    # End 
(?:[^\d]?)      # OB 
(?<foo>[AB])     # Foo? 
(?:[^AB]+) 
(?<bar>\d) 
$ 
"; 

    var validNumbers = new List<string>() { 
    "11-1234-5678-27-A-2", // missing wtvCode and countryCode 
    "48-1234-5678-27-A-2", // missing wtvCode and countryCode 
    "55-48-1234-5678-27-A-2", // missing wtvCode 
    "ABC-501-48-1234-5678-27-A-2" // Calling Belize (501) 
}; 

    validNumbers.ForEach(nm => 
       { 
        // IgnorePatternWhitespace only allows us to comment the pattern; does not affect processing 
        var result = Regex.Match(nm, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.RightToLeft).Groups; 

        Console.WriteLine (Environment.NewLine + nm); 
        Console.WriteLine("\tWTV code : {0}", result["wtvCode"].Value); 
        Console.WriteLine("\tcountryCode : {0}", result["countryCode"].Value); 
        Console.WriteLine("\tareaCityCode: {0}", result["areaCityCode"].Value); 
        Console.WriteLine("\tphoneNumber : {0}{1}{2}", result["PhoneStart"].Value, result["PhoneMiddle"].Value, result["PhoneEnd"].Value); 

       } 
    );

结果：

11-1234-5678-27-A-2 
    WTV code : 
    countryCode : 
    areaCityCode: 11 
    phoneNumber : 1234567827 

48-1234-5678-27-A-2 
    WTV code : 
    countryCode : 
    areaCityCode: 48 
    phoneNumber : 1234567827 

55-48-1234-5678-27-A-2 
    WTV code : 
    countryCode : 55 
    areaCityCode: 48 
    phoneNumber : 1234567827 

ABC-501-48-1234-5678-27-A-2 
    WTV code : ABC 
    countryCode : 501 
    areaCityCode: 48 
    phoneNumber : 1234567827

注：

如果国家代码和城市代码之间没有分隔符，有没有办法ap arser可以确定什么是城市，什么是国家。
您原来的国家/地区代码模式失败[2-9]，其中有0的国家/地区的任何国家/地区失败。因此我将它改为[2-90]。

来源

2012-03-30 14:27:26 OmegaMan

非常明确，谢谢。而且，[2-90]也不适用于电话国家/地区代码。我结束了使用这里描述的模式：http://stackoverflow.com/a/6967885/136381 – 2012-03-30 18:17:44

重叠的规则与命名组

回答

相关问题