2017-04-08 122 views
1

从数据框查询时,我尝试使用rlike而没有太多成功。Spark SQL像查找所有跟踪数字的字符串

样本数据:

column_a|column_b 
1|abc xyz 
2|123 abc xyz 
3|abc 123 xyz 
4|abc 123 
5|xyz 123 

预期输出:

column_a|column_b 
4|abc 123 
5|xyz 123 

我曾尝试:

select * from table_1 where column_b rlike '\d+$' (select * from table_1 where column_b rlike '/\d+$') 

输出(没有结果):

column_a|column_b 

我也试过:

select * from table_1 where column_b rlike '\d*$' (select * from table_1 where column_b rlike '/\d*$') 

输出(所有行):

column_a|column_b 
1|abc xyz 
2|123 abc xyz 
3|abc 123 xyz 
4|abc 123 
5|xyz 123 

是我的正则表达式不正确的?我已经测试过使用python和在线测试器,它看起来是正确的。还是喜欢支持一些特定的正则表达式?

回答

2

您需要多一点逃避才能使其工作。特别是:

spark.sql("SELECT 'abc 123' RLIKE '\\\\d+$'").show() 
+------------------+ 
|abc 123 RLIKE \d+$| 
+------------------+ 
|    true| 
+------------------+ 
spark.sql("SELECT '123 abc xyz' RLIKE '\\\\d+$'").show() 
+----------------------+ 
|123 abc xyz RLIKE \d+$| 
+----------------------+ 
|     false| 
+----------------------+