以自由格式输入文件为TSV文件,不知道所有的列语义,这是一种编写查询的方法。请注意,我做出了评论中提供的假设。
@d =
EXTRACT path string,
user string,
num1 int,
num2 int,
start_date string,
end_date string,
flag string,
year int,
s string,
another_date string
FROM @"\users\temp\citypaths.txt"
USING Extractors.Tsv(encoding: Encoding.Unicode);
// I assume that you have only one DateTime format culture in your file.
// If it becomes dependent on the region or city as expressed in the path, you need to add a lookup.
@d =
SELECT new SqlArray<string>(path.Split('\\')) AS steps,
DateTime.Parse(end_date, new CultureInfo("fr-FR", false)).Date.ToString("yyyy-MM-dd") AS end_date
FROM @d;
// This assumes your paths have a fixed formatting/mapping into the city
@d =
SELECT steps[4].ToLowerInvariant() AS city,
end_date
FROM @d;
@res =
SELECT city,
end_date,
COUNT(*) AS count
FROM @d
GROUP BY city,
end_date;
OUTPUT @res
TO "/output/result.csv"
USING Outputters.Csv();
// Now let's pivot the date and count.
OUTPUT @res2
TO "/output/res2.csv"
USING Outputters.Csv();
@res2 =
SELECT city, MAP_AGG(end_date, count) AS date_count
FROM @res
GROUP BY city;
// This assumes you know exactly with dates you are looking for. Otherwise keep it in the first file representation.
@res2 =
SELECT city,
date_count["2016-11-21"]AS [2016-11-21],
date_count["2016-11-22"]AS [2016-11-22]
FROM @res2;
更新后得到了一些实例DATA IN私人电子邮件:基于数据
你发给我的(城市的提取和计数,你要么可以用做后合并为中概述Bob的回答是,您需要事先了解您的城市,或者从我的示例中的城市位置获取字符串,您不需要事先知道城市),您想要将行集枢转city, count, date
进入行集date, city1, city2, ...
的每行都包含每个城市的日期和计数。
你可以很容易地通过以下方式改变@res2
计算调整我上面的例子:
// Now let's pivot the city and count.
@res2 = SELECT end_date, MAP_AGG(city, count) AS city_count
FROM @res
GROUP BY end_date;
// This assumes you know exactly with cities you are looking for. Otherwise keep it in the first file representation or use a script generation (see below).
@res2 =
SELECT end_date,
city_count["istanbul"]AS istanbul,
city_count["midlands"]AS midlands,
city_count["belfast"] AS belfast,
city_count["acoustics"] AS acoustics,
city_count["amsterdam"] AS amsterdam
FROM @res2;
注意,在我的例子,你需要看它枚举枢轴语句中的所有城市在SQL.MAP列中。如果这不是已知的,你将不得不首先提交一个脚本来为你创建脚本。例如,假设您的city, count, date
行集位于文件中(或者您可以复制语句以在生成脚本和生成的脚本中生成行集),则可以将其写为以下脚本。然后将结果作为实际处理脚本提交。
// Get the rowset (could also be the actual calculation from the original file
@in = EXTRACT city string, count int?, date string
FROM "https://stackoverflow.com/users/temp/Revit_Last2Months_Results.tsv"
USING Extractors.Tsv();
// Generate the statements for the preparation of the data before the pivot
@stmts = SELECT * FROM (VALUES
("@s1", "EXTRACT city string, count int?, date string FROM \"https://stackoverflow.com/users/temp/Revit_Last2Months_Results.tsv\" USING Extractors.Tsv();"),
("@s2", "SELECT date, MAP_AGG(city, count) AS city_count FROM @s1 GROUP BY date;")
) AS T(stmt_name, stmt);
// Now generate the statement doing the pivot
@cities = SELECT DISTINCT city FROM @in2;
@pivots =
SELECT "@s3" AS stmt_name, "SELECT date, "+String.Join(", ", ARRAY_AGG("city_count[\""+city+"\"] AS ["+city+"]"))+ " FROM @s2;" AS stmt
FROM @cities;
// Now generate the OUTPUT statement after the pivot. Note that the OUTPUT does not have a statement name.
@output =
SELECT "OUTPUT @s3 TO \"/output/pivot_gen.tsv\" USING Outputters.Tsv();" AS stmt
FROM (VALUES(1)) AS T(x);
// Now put the statements into one rowset. Note that null are ordering high in U-SQL
@result =
SELECT stmt_name, "=" AS assign, stmt FROM @stmts
UNION ALL SELECT stmt_name, "=" AS assign, stmt FROM @pivots
UNION ALL SELECT (string) null AS stmt_name, (string) null AS assign, stmt FROM @output;
// Now output the statements in order of the stmt_name
OUTPUT @result
TO "/pivot.usql"
ORDER BY stmt_name
USING Outputters.Text(delimiter:' ', quoting:false);
现在下载并提交它。
非常感谢你wBob,你真的让我的工作变得简单我只是用谷歌搜索找到一些方法来做到这一点。 Bob还有一件事,如果你看过我的输出链接,你必须看到2个字段“位置”和“日期”,这意味着按日期位置的文件数量。如何也可以添加到您提供的上述解决方案中。请指教。再一次非常感谢你回复我的帖子这么快:-) –
好极了,你应该考虑把它当作答案! – wBob
日期在哪里?从您的示例数据中不清楚。它在文件名中,还是你需要从文件本身收集它? – wBob