如何使用spark过滤Hive中的记录

我有我的输入为 -

+-------+ 
|  y| 
+-------+ 
| ""no""| 
| ""no""| 
| ""no""| 
|""yes""| 
| ""no""| 
| ""no""| 
| ""no""| 
| ""no""| 
|""yes""| 
| ""no""| 
| ""no""| 
| ""no""| 
| ""no""| 
|""yes""| 
| ""no""| 
| ""no""| 
+-------+

而且我querying-

sqlContext.sql("select count(y) from dummy where y='yes'").show()

，输出为 -

+---+ 
|_c0| 
+---+ 
| 0| 
+---+

y被声明为字符串类型的DDL

来源

2017-06-29 Ninja

如果已经使用'.replaceAll（ “\” \ “”， “”）'早：d – philantrovert

你应该试试这个：

sqlContext.sql("select count(y) from dummy where y='\"\"yes\""'").show()

请注意，您的数据有""yes""不仅仅是yes。

你仍然需要你的数据:)

的清洗或做这种方式：

sqlContext.sql("select count(y) from dummy where y like '%yes%'").show()

来源

2017-06-29 11:05:16

再次感谢！它的工作.. – Ninja

你可以'接受'的答案或upvote，如果这个按问题工作，并在你的情况。 –

如何使用spark过滤Hive中的记录

回答

相关问题