2017-08-25 298 views
1

我想从Hive获取数据,例如:如果一列值在列表中,则从Hive中选择数据。从Hive中选择数据列表中的列值

示例数据在蜂巢表是:

Col1 | Col2 | Col3 
-------+--------------- 
Joe | 32 | Place-1 
Nancy | 28 | Place-2 
Shalyn | 35 | Place-1 
Andy | 20 | Place-3 

我查询蜂巢表:

val name = List("Sherley","Joe","Shalyan","Dan") 
var dataFromHive = sqlCon.sql("select Col1,Col2,Col3 from default.NameInfo where Col1 in (${name})") 

我知道我的查询是错误的,因为它的投掷的错误。但我无法正确更换where Col1 in (${name})

+0

什么?请参阅https://stackoverflow.com/questions/40218473/spark-sql-in-clause/40218776#40218776 –

回答

0

更好的主意是将name转换为DataFrame并与dataFromHive连接。内部联接与仅筛选相交的数据相同。

val nameDf = List("Sherley","Joe","Shalyan","Dan").toDF("Col1") 
var dataFromHive = sqlCon.table("default.NameInfo").join(nameDf, "Col1").select("Col1", "Col2", "Col3") 

尝试使用DataFrame API。它会使代码易于阅读。

0

转换您的清单字符串(以适当的格式在蜂巢查询中使用)有关使用数据帧API

val name = List("Sherley","Joe","Shalyan","Dan") 
val name_string = name.mkString("('","','", "')") 
//name_string: String = ('Sherley','Joe','Shalyan','Dan') 

var dataFromHive = sqlCon.sql("select Col1,Col2,Col3 from default.NameInfo where Col1 in " + name_string)