如何编写嵌套查询？

我有如下表：如何编写嵌套查询？

+-----+---+----+ 
|type | t |code| 
+-----+---+----+ 
| A| 25| 11| 
| A| 55| 42| 
| B| 88| 11| 
| A|114| 11| 
| B|220| 58| 
| B|520| 11| 
+-----+---+----+

而且我想要的东西：

+-----+---+----+ 
|t1 | t2|code| 
+-----+---+----+ 
| 25| 88| 11| 
| 114|520| 11| 
+-----+---+----+

有两种类型的事件A和B. 事件A是开始，事件B是结束。我想连接开始和代码的下一个结束依赖。

这是很容易在SQL做到这一点：

SELECT a.t AS t1, 
    (SELECT b.t FROM events AS b WHERE a.code == b.code AND a.t < b.t LIMIT 1) AS t2, a.code AS code 
FROM events AS a

但是我不得不问题来实现这一点星火，因为它看起来像这种嵌套查询不被支持...

我试了一下：

df.createOrReplaceTempView("events") 
val sqlDF = spark.sql(/* SQL-query above */)

的错误，我得到：

Exception in thread "main" org.apache.spark.sql.AnalysisException: Accessing outer query column is not allowed in:

您有任何其他想法来解决这个问题吗？

来源

2017-10-18 Boendal

_“因为它看起来像这种嵌套查询不被支持”_你是否尝试运行它？你有什么异常？如果是这样，那么把它包含在你的问题中。 –

这是很容易在SQL做到这一点

所以是Spark SQL，幸运。

val events = ... 
scala> events.show 
+----+---+----+ 
|type| t|code| 
+----+---+----+ 
| A| 25| 11| 
| A| 55| 42| 
| B| 88| 11| 
| A|114| 11| 
| B|220| 58| 
| B|520| 11| 
+----+---+----+ 

// assumed that t is int 
scala> events.printSchema 
root 
|-- type: string (nullable = true) 
|-- t: integer (nullable = true) 
|-- code: integer (nullable = true) 

val eventsA = events. 
    where($"type" === "A"). 
    as("a") 
val eventsB = events. 
    where($"type" === "B"). 
    as("b") 
val solution = eventsA. 
    join(eventsB, "code"). 
    where($"a.t" < $"b.t"). 
    select($"a.t" as "t1", $"b.t" as "t2", $"a.code"). 
    orderBy($"t1".asc, $"t2".asc). 
    dropDuplicates("t1", "code"). 
    orderBy($"t1".asc)

这应该给你要求的输出。

scala> solution.show 
+---+---+----+ 
| t1| t2|code| 
+---+---+----+ 
| 25| 88| 11| 
|114|520| 11| 
+---+---+----+

来源

2017-10-18 09:26:00

固定。感谢审查和批准:) –

如何编写嵌套查询？

回答

相关问题