2017-10-13 133 views
0

我想从套接字流中获取每条记录。我希望记录是来自行的字符串数据类型。如何在python中编写代码?谢谢!如何从socketTextStream获取字符串格式的记录

模型= pipeline.PipelineModel.read()。负载(model_path)

SC = spark.sparkContext SSC =的StreamingContext(SC,1)

线= ssc.socketTextStream(sys.argv中[ 1],INT(sys.argv中[2]))

如果(线不是无): lines.foreachRDD(拉姆达RDD:rdd.foreach(processRecord))

DEF processRecord(记录):

print("test") 
... 

回答

0
from __future__ import print_function 
import sys 
from pyspark import SparkContext 
from pyspark.streaming import StreamingContext 


if __name__ == "__main__": 
    sc = SparkContext(appName="Demo") 
    ssc = StreamingContext(sc, 1) 

    #record = ssc.socketTextStream("localhost", 9999) 
    record = ssc.socketTextStream(sys.argv[1], int(sys.argv[2])) 
    # print out each single word 
    record.flatMap(lambda line: line.split(" ")).pprint() 

    # start streaming 
    ssc.start() 
    # stop when the socket we are listening is dead 
    ssc.awaitTermination() 

谢谢。

+0

记录不是字符串类型 – icecream

+0

我在那里添加了更多的代码。请检查我的代码有什么问题。谢谢! – icecream

相关问题