1
在独立应用程序(在java8上运行,Windows 10 with spark-xxx_2.11:2.0。 0作为依赖的JAR)下一个代码给出了一个错误:带有DataFrame API的Apache Spark MLlib在createDataFrame()或read()时发出java.net.URISyntaxException。csv(...)
/* this: */
Dataset<Row> logData = spark_session.createDataFrame(Arrays.asList(
new LabeledPoint(1.0, Vectors.dense(4.9,3,1.4,0.2)),
new LabeledPoint(1.0, Vectors.dense(4.7,3.2,1.3,0.2))
), LabeledPoint.class);
/* or this: */
/* logFile: "C:\files\project\file.csv", "C:\\files\\project\\file.csv",
"C:/files/project/file.csv", "file:/C:/files/project/file.csv",
"file:///C:/files/project/file.csv", "/file.csv" */
Dataset<Row> logData = spark_session.read().csv(logFile);
例外:
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/files/project/spark-warehouse
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373)
at <call in my line of code>
我如何可以加载CSV文件导入Dataset<Row>
从Java代码?
难以置信,但问题的严重程度为“轻微”。此解决方法解决了我的错误,谢谢! –