2014-10-09 69 views
1

在Pentaho的,当我运行一个卡桑德拉输入工序,获得约5万行,我得到这个异常:Pentaho框架尺寸(17727647)大于最大长度(16384000)!

有没有办法来控制Pentaho的查询结果的大小?或者有没有一种方法可以将查询结果进行流式处理,而不是全部批量处理?

2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Unexpected error 
2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)! 
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)! 
2014/10/09 15:14:09 - Cassandra Input.0 - 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:355) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.processRow(CassandraInput.java:234) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 
2014/10/09 15:14:09 - Cassandra Input.0 - at java.lang.Thread.run(Unknown Source) 
2014/10/09 15:14:09 - Cassandra Input.0 - Caused by: org.apache.thrift.transport.TTransportException: Frame size (17727647) larger than max length (16384000)! 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1656) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1642) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.cassandra.legacy.LegacyCQLRowHandler.newRowQuery(LegacyCQLRowHandler.java:289) 
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:333) 
2014/10/09 15:14:09 - Cassandra Input.0 - ... 3 more 
2014/10/09 15:14:09 - Cassandra Input.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=1) 
2014/10/09 15:14:09 - all customer data - Transformation detected one or more steps with errors. 
2014/10/09 15:14:09 - all customer data - Transformation is killing the other steps! 
+0

我也在使用cassandra,但从不面临这样的错误,尝试增加cassandra.yaml和xmx1024中的read_request_timeout_in_ms m在pentaho.bat或.sh根据您的操作系统,并检查您是否面临这样的错误或没有。 – 2014-10-10 06:38:50

+0

您的查询有多大?你是否发出返回大约60,000行或5行以上的查询? – 2014-10-11 10:27:36

+0

我回来了200000行,有7列。 – 2014-10-13 06:09:38

回答

2
org.apache.thrift.transport.TTransportException: 
    Frame size (17727647) larger than max length (16384000)! 

甲限制强制为多大帧(节俭消息)可以避免性能下降。你可以通过修改一些设置来调整它。这里需要注意的是,您需要设置bot客户端大小和服务器端的设置。

服务器端cassandra.yaml

# Frame size for thrift (maximum field length). 
# default is 15mb, you'll have to increase this to at-least 18. 
thrift_framed_transport_size_in_mb: 18 

# The max length of a thrift message, including all fields and 
# internal thrift overhead. 
# default is 16, try to keep it to thrift_framed_transport_size_in_mb + 1 
thrift_max_message_length_in_mb: 19 

设置客户端限制取决于你所使用的驱动程序。

+0

我已经在cassandra服务器上做了这个。我使用的是pentaho BI,我似乎无法找到一种方法来改变pentaho的大小。 – 2014-10-11 10:29:11

+0

我也面临同样的问题@ user3712422你解决了这个问题 – 2014-11-12 11:21:44

0

我使用PDI 5.2解决了这些问题,它具有Cassandra Input步骤中的属性max_length,将此属性设置为更高的值,如1GB解决了这些问题。

0

您可以尝试在服务器端下面的方法:

TNonblockingServerSocket tnbSocketTransport = new TNonblockingServerSocket(listenPort); 
TNonblockingServer.Args tnbArgs = new TNonblockingServer.Args(tnbSocketTransport); 

//最大长度被配置为1GB,而默认大小为16MB

tnbArgs.transportFactory(new TFramedTransport.Factory(1024 * 1024 * 1024)); 
tnbArgs.protocolFactory(new TCompactProtocol.Factory()); 
TProcessor processor = new UcsInterfaceThrift.Processor<UcsInterfaceHandler>(ucsInterfaceHandler); 
tnbArgs.processor(processor); 
TServer server = new TNonblockingServer(tnbArgs); 
server.serve(); 
0

那么它确实为我工作..

Cassandra版本:[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL规范3.3.0 | 本机协议V4]

Pentaho的PDI版本:PDI-CE-5.4.0.1-130

在cassandra.yaml更改的设置:

# Whether to start the thrift rpc server. 
start_rpc: true 

# Frame size for thrift (maximum message length). 
thrift_framed_transport_size_in_mb: 35 

卡桑德拉输出步设置更改为:

Port: 9160 
"Use CQL Version 3": checked