0
假设我有以下的表格,检查字计数字符串和更少的计数删除的话 - 蜂巢
date_part string_word id
2017-08-08 India America Advance Apartments 1
2017-08-08 Apartments Planner Headlines 1
2017-08-08 India America Headlines Gucci 1
2017-08-08 Images Same Thing Africa 2
2017-08-08 Images 2
2017-08-07 India America Advance Apartments 2
2017-08-07 Apartments Planner Headlines 3
2017-08-07 India America Headlines Gucci 3
2017-08-07 Images Same Thing Africa 3
2017-08-07 Images 4
现在我想找到字数每天和删除的话数量较少。为了找到字数,我写了下面的查询,
SELECT date_part, word, COUNT(*) as total_word_count
FROM table_name LATERAL VIEW explode(split(string_word, ' ')) lTable as word
where date_part > '2017-08-05'
GROUP BY date_part, word
这将给以下,
date_part word total_word_count
2017-08-08 India 2
2017-08-08 America 2
2017-08-08 Advance 1
2017-08-08 Apartments 2
2017-08-08 Planner 1
2017-08-08 Headlines 2
2017-08-08 Gucci 1
2017-08-08 Images 2
2017-08-08 Same 1
2017-08-08 Thing 1
2017-08-08 Africa 1
2017-08-07 India 2
2017-08-07 America 2
2017-08-07 Advance 1
2017-08-07 Apartments 2
2017-08-07 Planner 1
2017-08-07 Headlines 2
2017-08-07 Gucci 1
2017-08-07 Images 2
2017-08-07 Same 1
2017-08-07 Thing 1
2017-08-07 Africa 1
现在我想用计数删除的话小于2,即用1字应该在每个日期删除计数。以下应该是输出,
date_part string_word id
2017-08-08 India America Apartments 1
2017-08-08 Apartments Headlines 1
2017-08-08 India America Headlines 1
2017-08-08 Images 2
2017-08-08 Images 2
2017-08-07 India America Apartments 2
2017-08-07 Apartments Headlines 3
2017-08-07 India America Headlines 3
2017-08-07 Images 3
2017-08-07 Images 4
这里带有1计数的单词已被删除。这是我期望得到的输出,这也是每天都要做的。
有人可以帮我做这件事吗?
感谢
加上'HAVING total_word_count> 1'到查询... –
@usagi过滤是罚款。但是我想从原始表格中删除单词。只有一个以上的计数应该存在。剩下的话应该删除。这就是我正在看的问题 – haimen