用分隔符分隔电子邮件字符串

我有一个电子邮件地址阵列（大概超过50,000个），我有兴趣计算特定电子邮件域的频率。举例来说，如果我有用分隔符分隔电子邮件字符串

emails = [ 
    '[email protected]', 
    '[email protected]', 
    '[email protected]', 
    '[email protected]', 
    '[email protected]' 
]

，我很感兴趣，其中电子邮件域出现最多的，我希望与频率2返回'gmail'。

要做到这一点，我认为这是一个好主意，通过数组并丢弃在@之前发生的所有事情，并将域保存为新数组，然后我可以迭代。我将如何做到这一点？

来源

2016-06-13 Chumbawoo

假设你的电子邮件是string，你可以做这样的事情：

emails = ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"] 
counts = Hash.new(0) 
emails.each { |t| counts[t.partition("@").last] += 1} 
counts #{"gmail.com"=>2, "yahoo.com"=>1, "aol.com"=>1, "someuni.xyz.com"=>1}

来源

2016-06-13 06:19:24 pyfl88

您可以将它组合到'counts = emails.each_with_object（Hash.new（0））{| t，h | ...}' – Stefan

谢谢！如果我然后想根据整数来排序信息，我该怎么做？例如，如果我使用'counts.sort'（取自您使用的相同计数），我会得到一个输出，如[gmail.com，2]，[yahoo.com，1] ..'这将按字母顺序排列。我想用数字顺序排序，换句话说，用频率最高的电子邮件排序 – Chumbawoo

不要紧，它似乎使用'counts.sort_by {| a，b | b.to_i}' – Chumbawoo

emails.map { |e| e.split('@').last } # leave domains 
     .group_by { |s| s }   # group 
     .map { |k, v| [k, v.count] } # count 
     .sort_by(&:last)    # sort 
     .last       # get results 
#⇒ ["gmail.com", 2]

来源

2016-06-13 06:24:04 mudasobwa

有趣的是，提供预期结果的唯一答案（“我想返回频率为2的'gmail'）被downvoted :) – mudasobwa

我认为这将是一个好主意，[...]只保留域作为新数组，然后我可以迭代。我将如何做到这一点？

您应该使用合适的库来解析电子邮件地址，例如Mail宝石。它配备了一个实用工具类Mail::Address提供对地址字段轻松访问：

require 'mail' 

emails = %w[ 
    [email protected] 
    [email protected] 
    [email protected] 
    [email protected] 
    [email protected] 
] 

domains = emails.map { |email| Mail::Address.new(email).domain } 
#=> ["gmail.com", "yahoo.com", "aol.com", "someuni.xyz.com", "gmail.com"]

它也可以处理更为复杂的地址格式。来自documentation：

a = Address.new('Mikel Lindsaar (My email address) <[email protected]>') 
a.format  #=> 'Mikel Lindsaar <[email protected]> (My email address)' 
a.address  #=> '[email protected]' 
a.display_name #=> 'Mikel Lindsaar' 
a.local  #=> 'mikel' 
a.domain  #=> 'test.lindsaar.net' 
a.comments  #=> ['My email address'] 
a.to_s   #=> 'Mikel Lindsaar <[email protected]> (My email address)'

来源

2016-06-13 06:37:12 Stefan

类似于mudasobwa的回答。

emails 
.group_by{|s| s.partition("@").last} 
.map{|k, v| [k, v.length]} 
.max_by(&:last) 
# => ["gmail.com", 2]

来源

2016-06-13 07:05:13 sawa

用分隔符分隔电子邮件字符串

回答

相关问题