2010-09-14 105 views

回答

4

为什么不直接使用urlparse呢?

+0

同意;正则表达式不适用于URI,电子邮件地址或标记。 – 2010-09-14 07:32:26

+0

@Delan:我很确定使用URI的正则表达式是完全正确的。他们甚至会给你一个解析RFC 3986中的URI。 – 2010-09-14 07:36:13

+0

尽管大多数URI都很简单,但也有一些怪癖和复杂性,就像电子邮件地址一样,这会造成一些误报和否定。我不记得是谁,但是有人写了一个正则表达式来验证电子邮件地址是否符合规范,作为这个概念的证明,并且它填充了一个页面。 – 2010-09-14 07:38:18

0

答案取决于你是否想要解析URL,或者你是否想知道如何处理可选的斜线。

在第一种情况下,我同意琥珀,你应该使用urlparse。

在第二种情况下,使用一个?在表达式中斜线后:

http://xyz.com//?abc 

在正则表达式的?意味着先前的元素是可选的(即可能出现零次或一次)。

0

你可以使用这个表达式:

\w{4}\:\/{2}\w+\.\w+\/{1,2}\w+ 

解释:

\w{4} match any word character [a-zA-Z0-9_] 
    Quantifier: Exactly 4 times 
\: matches the character : literally 
\/{2} matches the character/literally 
    Quantifier: Exactly 2 times 
\w+ match any word character [a-zA-Z0-9_] 
    Quantifier: Between one and unlimited times, as many times as possible, giving back as needed 
\. matches the character . literally 
\w+ match any word character [a-zA-Z0-9_] 
    Quantifier: Between one and unlimited times, as many times as possible, giving back as needed 
\/{1,2} matches the character/literally 
    Quantifier: Between 1 and 2 times, as many times as possible, giving back as needed 
\w+ match any word character [a-zA-Z0-9_] 
    Quantifier: Between one and unlimited times, as many times as possible, giving back as needed 

希望这会有所帮助。