数据汇总和200亿记录平均值

记录始于每天使用以下模式创建的AVRO文件。 “attribute_key”和“attribute_value”记录中存储了20种不同的属性类型，每个测量中也包含时间戳和device_id。数据汇总和200亿记录平均值

"fields" : [ 
{"type":"string", "name":"device_id"}, 
{"type":"string", "name":"record_date"}, 
{"type":"string", "name":"attribute_key"}, 
{"type":"string", "name":"attribute_value"}]

我已经能够采取每日文件，并加载到bigquery月分隔表中。

device_attributes201501 
device_attributes201502 
device_attributes201503 
device_attributes201504 
device_attributes201505 
device_attributes201506 
device_attributes201507 
device_attributes201508 
device_attributes201509 
device_attributes201510 
device_attributes201511 
device_attributes201512

我的问题是双重的，

我需要创建一个包含所有在所有时间收集的独特device_ids，并为每个值类型的最新属性值的表。

device_id, record_date, attribute_key, attribute_value 
    abc123  2015-10-11 attribute_1 5 
    abc123  2015-11-11 attribute_1 5 
    abc123  2015-12-11 attribute_1 10 
    abc123  2015-10-11 attribute_1 0 
    abc456  2015-10-11 attribute_1 0 
    abc789  2015-10-11 attribute_1 0 
    abc123  2015-11-11 attribute_1 0 
    abc456  2015-11-11 attribute_1 0 
    abc789  2015-11-11 attribute_1 6 
    abc123  2015-10-11 attribute_2 blue 
    abc123  2015-11-11 attribute_2 red 
    abc123  2015-12-11 attribute_2 red 
    abc456  2015-12-11 attribute_2 blue 
    abc789  2015-12-11 attribute_2 green

对于某些属性，每周，每月和每天的平均值也需要计算。（attribute_3是样本收集的平均值）

device_id, last_update, attribute_1, attribute_2 
    abc123  2015-12-11 6   red 
    abc456  2015-12-11 0   blue 
    abc789  2015-12-11 3   green

我很好奇如何最好地采取利用这个，我不知道在哪里，从这里走。这些数据现在处于大查寻中，我可以访问整套谷歌clould工具......比如数据流或其他任何东西。

数据最初是在S3存储桶中，所以我可以使用AWS上的任何解决方案处理它。

我只是不知道什么是最明智的做法。

来源

2017-02-21 chews

BigQuery SQL查询应该适用于您想要执行的操作。你有这种方法的问题吗？ –

+在BigQuery中用SQL粉碎它。 –

BigQuery，因为您不必编写大量代码就可以进行基本聚合 – softwarenewbie7331

希望这些链接中的一些可以帮助你。创建一个表 https://cloud.google.com/bigquery/docs/tables#creating-a-table

的BigQuery的Web UI https://cloud.google.com/bigquery/bigquery-web-ui

如何从一个查询（从用户的博客文章）创建一个表。这表明您可以使用BQ WebUI并指定目标表。我无法在官方文档中找到它，所以不确定这是否有效。如果没有，您需要设置API并编写一些代码，如上面的示例所示。 https://chartio.com/resources/tutorials/how-to-create-a-table-from-a-query-in-google-bigquery/

来源

2017-02-22 01:47:48

数据汇总和200亿记录平均值

回答

相关问题