总结一些词出现表

我正在作为一个项目的一部分，目前正在努力的逻辑挑战，并已尝试解决它的工作数小时。我有：总结一些词出现表

data = [ 
    ["this is a list of words", "2"], 
    ["another list of words", "2"] 
]

我想回到这个：

data = [ 
    ["this", "2"], 
    ["is", "2"], 
    ["a", "2"], 
    ["list", "4"], 
    ["of", "4"], 
    ["another", "2"], 
    ["words", "4"] 
]

本质，在指数位置的单词串[0]得到分流和任何重复得到去除，但指数[1]被添加如果有重复。

我已经尝试了很多事情，分裂，并使用propigation以及无数的迭代，但一切似乎都走到了死胡同。我确信有一个相当简单的解决方案。

这是我最新的尝试：

#Loop through each data item 
    data.each do |obj| 
    # create each obj to an array and save to var 
    newObj = obj.permutation(1).to_a 
    # loop through array of words and split storing the count 
    split_query = newObj[0].each do |e| 
    query_count = e.split(' ').count 
    print e.split(' ') 
    end 
    end

来源

2015-07-10 user3927582

这是值得你加入的尝试之一。即使你被困住了，它很可能已经解决了问题的某些部分。然后，您可以获得特定于您需要学习的Ruby的帮助，而不仅仅是解决方案。 –

当然，我已经编辑过！ – user3927582

您可以使用字典：

hash = Hash.new {0} 
data.each do |v| 
    x = v[1].to_i 
    v[0].split.each do |word| 
    hash[word] += x 
    end 
end 
result = hash.map {|k,v| [k, v.to_s]}

产量：

result 
=> [["this", "2"], 
    ["is", "2"], 
    ["a", "2"], 
    ["list", "4"], 
    ["of", "4"], 
    ["words", "4"], 
    ["another", "2"]]

来源

2015-07-10 19:24:26

使用'.each'而不是'for ... in'循环更为习惯。 –

同意。我将更改示例代码。 –

完美，发挥魅力。谢谢！ – user3927582

你可以做如下。

代码

def tally(data) 
    data.flat_map { |str,val| str.split.product([val.to_i]) }. 
     group_by(&:first). 
     map { |_,arr| [arr.first.first, arr.reduce(0) { |t,(_,val)| t+val }.to_s] } 
end

例

data = [ 
    ["this is a list of words", "2"], 
    ["another list of words", "2"], 
    ["yet one more list", "3"], 
    ["and a final one", "4"]] 
tally data 
    #=> [["this", "2"], ["is", "2"], ["a", "6"], ["list", "7"], 
    # ["of", "4"], ["words", "4"], ["another", "2"], ["yet", "3"], 
    # ["one", "7"], ["more", "3"], ["and", "4"], ["final", "4"]]

它可能会更有用，返回其表示为整数，而不是字符串的数对。

说明

对于该示例，这些是一步一步的计算：

a = data.flat_map { |str,val| str.split.product([val.to_i]) } 
    #=> [["this", 2], ["is", 2], ["a", 2], ["list", 2], ["of", 2], 
    # ["words", 2], ["another", 2], ["list", 2], ["of", 2], 
    # ["words", 2], ["yet", 3], ["one", 3], ["more", 3], ["list", 3],  
    # ["and", 4], ["a", 4], ["final", 4], ["one", 4]] 
b = a.group_by(&:first) 
    #=> {"this"=>[["this", 2]], 
    # "is"=>[["is", 2]], 
    # "a"=>[["a", 2], ["a", 4]], 
    # "list"=>[["list", 2], ["list", 2], ["list", 3]], 
    # "of"=>[["of", 2], ["of", 2]], 
    # "words"=>[["words", 2], ["words", 2]], 
    # "another"=>[["another", 2]], 
    # "yet"=>[["yet", 3]], 
    # "one"=>[["one", 3], ["one", 4]], 
    # "more"=>[["more", 3]], 
    # "and"=>[["and", 4]], 
    # "final"=>[["final", 4]]} 
b.map { |_,arr| [arr.first.first, arr.reduce(0) { |t,(_,val)| t+val }.to_s] } 
    #=> (the result for the example shown above)

哈希替代

更自然在这里使用的散列，值是整数。为此，我们定义使用Hash::new为零的默认值的哈希：

def tally(data) 
    data.each_with_object(Hash.new(0)) do |(str,val),h| 
    str.split.each { |word| h[word] += val.to_i } 
    end 
end 

h = tally(data) 
    #=> {"this"=>2, "is"=>2, "a"=>6, "list"=>7, "of"=>4, "words"=>4, 
    # "another"=>2, "yet"=>3, "one"=>7, "more"=>3, "and"=>4, "final"=>4}

如果您愿意的关键是在降低的价值秩序：

sorted_keys = h.keys.sort_by { |k| -h[k] } 
    #=> ["one", "list", "a", "of", "and", "words", "final", "yet", 
    # "more", "another", "is", "this"] 
sorted_keys.zip(h.values_at(*sorted_keys)).to_h 
    #=> {"one"=>7, "list"=>7, "a"=>6, "of"=>4, "and"=>4, "words"=>4, 
    # "final"=>4, "yet"=>3, "more"=>3, "another"=>2, "is"=>2, "this"=>2}

Hash.new(0)通常被称为“计算哈希”。如果：

h = Hash.new(0)

则：

h[:a] += 1

等同于：

h[:a] = h[:a] + 1

如果h没有钥匙:a（因为是这样，当h为空），h[:a]上等式的右边等于散列的默认值，由给出的论点，这里是零。因此：

h[:a] = h[:a] + 1 
    # = 0 + 1 
    # = 1 
h #=> { :a => 1 }

下一次我们遇到的关键:a：

h[:a] += 1 
    #=> h[:a] = h[:a] + 1 
    #=>  = 1 + 1 
    #=>  = 2

来源

2015-07-10 22:04:06

总结一些词出现表

回答

相关问题