2017-08-28 83 views
0

我使用以下python生产者发布一些msg到我的kafka主题(我也可以在jupyter中使用python使用者完美地接收我发布的数据)。Spark Streaming应用程序无法从Kafka接收msg

from kafka import KafkaProducer 
import json,time 
userdata={ 
     "ipaddress": "172.16.0.57", 
     "logtype": "", 
     "mid": "", 
     "name":"TJ" 
} 
producer = KafkaProducer(bootstrap_servers=['localhost:9092'],value_serializer=lambda v: json.dumps(v).encode('utf-8')) 

for i in range(10): 
    print("adding",i) 
    producer.send('test', userdata) 
    time.sleep(3) 

但是当我尝试运行火花kafkastreaming例子,我没有得到任何东西(我要指出,火花在我的工作站工作,因为我可以运行网络流例如没有任何问题):

from __future__ import print_function 
from pyspark.streaming.kafka import KafkaUtils 
import sys 
import os 
from pyspark import SparkContext 
from pyspark.streaming import StreamingContext 
import json 


os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.10:2.0.2 pyspark-shell' 

sc = SparkContext("local[2]", "KafkaSTREAMWordCount") 
ssc = StreamingContext(sc, 2) 
kafka_stream = KafkaUtils.createStream(ssc,"localhost:2181","raw-event-streaming-consumer",{"test":1}) 

parsed = kafka_stream.map(lambda (k, v): json.loads(v)) 
parsed.pprint() 
ssc.start() 
ssc.awaitTermination() 

下面是输出的示例:

------------------------------------------- 
Time: 2017-08-28 14:08:32 
------------------------------------------- 

------------------------------------------- 
Time: 2017-08-28 14:08:33 
------------------------------------------- 

------------------------------------------- 
Time: 2017-08-28 14:08:34 
------------------------------------------- 

注:我的系统的规格为如下:

的Ubuntu 16.04 火花:火花2.2.0彬hadoop2.7 Jupyter笔记本(Python 2.7版) 卡夫卡:kafka_2.11-0.11.0.0

我在我的.bashrc下面几行:

export PATH="/home/myubuntu/anaconda3/bin:$PATH" 

export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/bin:$PATH" 

export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/jars:$PATH" 

export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/python:$PATH" 

export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/python/pyspark:$PATH" 

export PATH="/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/python/pyspark/streaming:$PATH" 


function snotebook() 
{ 
#Spark path (based on your computer) 
SPARK_PATH=~/spark-2.0.0-bin-hadoop2.7 

export PYSPARK_DRIVER_PYTHON="jupyter" 
export PYSPARK_DRIVER_PYTHON_OPTS="notebook" 

# For python 3 users, you have to add the line below or you will get an error 
#export PYSPARK_PYTHON=python3 

#$SPARK_PATH/bin/pyspark --master local[2] 
/home/myubuntu/Desktop/spark-2.2.0-bin-hadoop2.7/bin/pyspark --master local[2] 
} 
+0

你在哪里运行此,它不工作? – bendl

+0

在我的笔记本电脑(Ubuntu 16.04) – user2867237

回答

0

我发现了错误。随着火花火花2.2.0彬hadoop2.7,我们需要使用下面的jar:

--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 
相关问题