我正在使用spark 2.1.0和hadoop 2.7.3。使用Spark newAPIHadoopFile(FileInputFormat)met NotSerializableException
我试图用newAPIHadoopFile,非常简单的代码,在短短一个与主要方法类:
val spark = SparkSession.builder().appName("test").master("local[*]").getOrCreate()
val sparkContext = spark.sparkContext
val sparkConf = sparkContext.getConf
val file = "src/main/resources/chat.csv"
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkContext.getConf.registerKryoClasses(Array(
Class.forName("org.apache.hadoop.io.LongWritable"),
Class.forName("org.apache.hadoop.io.Text")
));
sparkConf.set("spark.kryo.classesToRegister", "org.apache.hadoop.io.LongWritable, org.apache.hadoop.io.Text")
val rdd = sparkContext.newAPIHadoopFile(file, classOf[KeyValueTextInputFormat], classOf[Text], classOf[Text])
rdd.collect().foreach(println)
我查了很多帖子在StackOverflow的,但仍然得到了错误:
java.io.NotSerializableException: org.apache.hadoop.io.Text
Serialization stack:
\t - object not serializable (class: org.apache.hadoop.io.Text, value: How about Italian?"})
\t - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
\t - object (class scala.Tuple2, ( How about Italian?"},))
\t - element of array (index: 0)
\t - array (class [Lscala.Tuple2;, size 3)
\t at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
编辑:chat.csv的内容:
{来自: “格特”,到: “梅丽莎”,邮件: “想吃饭?”}
{来自: “梅丽莎”,以“格特”的消息:“OK \
如何意大利”}
请帮助...
你能否从与类名和所有从这里开始的粘贴代码? –
所有的代码都在这里..除了主要方法的声明和导入。 – Furyegg