我们有kakfa hdfs连接器以默认avro格式写入hdfs。样品O/P:databricks avro模式无法转换为Spark SQL结构类型
的OBJ^A^B^Vavro.schema “[” 空”, “字符串”]^@ $ͳø{< 9D>¾Ã^ X:< 8D>紫外^ K^H5^F^F^B < 8a>^B {“severity”:“notice”,“message”:“Test message”,“facility”:“kern”,“syslog-tag”:“sawmill_test:”, “时间戳”:“2017-01-31T20:15:00 + 00:00”}^B < 8a>^B {“严重性”:“通知”,“消息”:“测试消息”,“设施”:“ kern“,”syslog-tag“:”sawmill_test:“,”timestamp“:”2017-01-31T20:15:00 + 00:00“}^B < 8a>^B {”severity“:”notice“, “message”:“Test message”,“facility”:“kern”,“syslog-tag”:“sawmill_test:”,“timestamp”:“2017-01-31T20:15:00 + 00:00”} $ͳø {< 9d>¾Ã^ X:< 8d> uV^K^H5
尝试使用阅读:
import com.databricks.spark.avro._
val df = spark.read.avro("..path to avro file")
我们得到以下错误
了java.lang.RuntimeException:Avro的模式不能转换到火花SQL StructType: [ “空”,“字符串“] at com.databricks.spark.avro.DefaultSource.inferSchema(DefaultSource.scala:93) at org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 7.apply(DataSource.scala:184 ) at org.apache.spark.sql.execution.datasources.DataSource $$ anonfun $ 7.apply(DataSource.scala:184) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.org $ apache $ spark $ sql $ execution $ datasources $ DataSource $$ getOrInferFileFormatSchema(DataSource.scala:183) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387) at org.apache.spark.sql.DataFrameReader。 (DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) at com.databricks.spark.avro.package $ AvroDataFrameReader $$ anonfun $ avro $ 2.apply( package.scala:34) at com.databricks.spark.avro.package $ AvroDataFrameReader $$ anonfun $ avro $ 2.apply(package.scala:34)
请帮
星火版本:2.11
火花的Avro版本:2.11-3.2.0
卡夫卡版本:0.10.2.1