4
我有一个Hive表,它跟踪在进程的各个阶段中移动的对象的状态。该表是这样的:使用python转换函数的Hive:“无法识别'transform'附近的输入”“错误
hive> desc journeys;
object_id string
journey_statuses array<string>
这里有一个记录的一个典型的例子:采用蜂巢0.13的collect_list
产生
12345678 ["A","A","A","B","B","B","C","C","C","C","D"]
在表中的记录和状态有一个订单(如果为了并不重要,我会用collect_set
)。对于每个object_id,我想缩短旅程以按照它们出现的顺序返回旅程状态。
我写了一个快速的Python脚本,从标准输入读取:
#!/usr/bin/env python
import sys
import itertools
for line in sys.stdin:
inputList = eval(line.strip())
readahead = iter(inputList)
next(readahead)
result = []
for id, (a, b) in enumerate(itertools.izip(inputList, readahead)):
if id == 0:
result.append(a)
if a != b:
result.append(b)
print result
我计划在蜂房transform
调用中使用此。看来工作时,本地运行:
$ echo '["A","A","A","B","B","B","C","C","C","C","D"]' | python abbreviate_list.py
['A', 'B', 'C', 'D']
然而,当我添加了文件,并尝试蜂巢内执行,则返回一个错误:
hive> add file abbreviateList.py;
Added resource: abbreviateList.py
hive> select
> object_id,
> transform(journey_statuses) using 'python abbreviateList.py' as journey_statuses_abbreviated
> from journeys;
NoViableAltException(... wall of Java error messages ...)
FAILED: ParseException line 3:2 cannot recognize input near 'transform' '(' 'journey_statuses' in select expression
你能看到我在做什么错?