2011-02-11 110 views
0

什么是正确的Perl regulat表达从文本文件中提取的电子邮件地址时,会根据本形式的书面提取电子邮件

有人在something.domainextension OR someone.someone在something.domainextension

是否有可能将这些地址转换为正常的电子邮件地址的正则表达式?

thanx提前

+0

这是一个重复的问题,你会在stackoverflow中找到很多答案。请记住,正确的电子邮件验证不能使用正则表达式完成,不应该完成。见http://www.regular-expressions.info/email.html – 2011-02-11 17:04:09

回答

0

我使用Ruby,但它会是一样的在Perl

>> "someone.someone at something.domainextension".sub(/\bat\b/,"@").gsub(/\s+/,"") 
=> "[email protected]" 

基本上刚刚替补 “在” 替换为 “@”,并删除所有空格。

0

我相信下面的代码可以完成你的任务。然而,如果你的电子邮件地址是跨行分割的,它也不会起作用,如果你只有“at something.com”,它也会给你一个误报。如果你可以发布信息,我可以让这段代码更具体一些来处理你的情况您的数据集中的一些示例数据。

正如在上面的评论中指出的那样,这绝对不会在RFC中找到任何有效的电子邮件地址,但我认为它应该能够解决您的问题。

my @lines_from_file; #holds our test info 

#load the test info 
$lines_from_file[0] = 'this is some text. We like to type to someone at somthing.com but sometimes'; 
$lines_from_file[1] = 'they go by someone.someone at something.com just to confuse us and hey you never'; 
$lines_from_file[2] = 'know, maybe they use parens like (someone at something.com).'; 
$lines_from_file[3] = 'make sure we do not find someone at .com. or someone something.com or someone at somethingcom'; 

my @all_email_addresses; #holds all found email addresses 


#foreach line in the file 
foreach my $line (@lines_from_file){ 
    while($line =~ /([0-9a-zA-Z.]+) #capture any number or letter or dot 1 or more times 
        \sat\s    #" at " 
        ([0-9a-zA-Z.]+  #capture any number or letter or dot 1 or more times 
        \.     #dot 
        \w{2,4})   #com or net or us or tv or info etc., 
        /xg){ 
     #everytime the line matches an email save the email in email form 
     push @all_email_addresses, "$1\@$2" ; 
    } 

} 

print "@all_email_addresses"; 
0
/^(?:(\w+)\.)?(\w+)\s+at\s+(\w+)\.(\w+)$/ 

这不会捕获所有的电子邮件地址,只是那些你所提供的形式。