2015-10-20 60 views
2

当我试图在R 2与加载数据:Sparkr Java错误

df <- read.df(sqlContext, "https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13.csv", "com.databricks.spark.csv",header=T) 

我正在一个错误用java

Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
    java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String 
    at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:74) 
    at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39) 
    at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27) 
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) 
    at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:156) 
    at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) 
    at java.lang.reflect.Method.invoke(Unknown Source) 
    at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132) 
    at or 
+0

您能否提供导致此错误的(最小)CSV文件? – vallismortis

+0

df < - read.df(sqlContext,“https://s3-us-west-2.amazonaws.com/sparkr-data/nycflights13","com.databricks.spark.cs""header=T) 这仍然不起作用,或者如果误解了请纠正我。 –

+0

大家好,我只是发现我有CSV加载问题,我曾尝试加载CSV包SparkR抛出错误以上,之后这实际上没有任何工作。 –

回答

3

我终于找到了对于上述溶液中。 需要确保以下

您已经安装了Java开发工具包,你可以从网站 下载this下载并保存到C:/ Hadoop的 在这个bin文件夹应该像C:/ Hadoop的/ bin中

设置JAVA_HOME环境变量(这里不提bin文件夹) 设置HADOOP_HOME为环境变量(这里不提bin文件夹)

现在运行以下

rm(list=ls()) 
    # Set the system environment variables 


Sys.setenv(SPARK_HOME = "C:/spark") 
Sys.setenv(HADOOP_HOME = "C:/Hadoop") 
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) 


#load the Sparkr library 
library(rJava) 
library(SparkR) 


Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.11:1.2.0" "sparkr-shell"') 

Sys.setenv(SPARK_MEM="1g") 


# Create a spark context and a SQL context 
sc <- sparkR.init(master = "local") 

sqlContext <- sparkRSQL.init(sc) 

现在你应该能够读取CSV文件

0

多次尝试后,我得到了什么是问题read.df()。标题属性会产生问题。标题应为header="true"header="false"

> people = read.df(sqlContext, "C:\\Users\\Vivek\\Desktop\\AirPassengers.csv", source = "com.databricks.spark.csv",header=TRUE) 
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
    java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String 

     at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:81) 

     at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:40) 

     at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:28) 

     at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125) 

     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) 

     at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:156) 

     at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala) 

     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 

     at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) 

     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) 

     at java.lang.reflect.Method.invoke(Unknown Source) 

     at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:132) 

     at or 
> people = read.df(sqlContext, "C:\\Users\\Vivek\\Desktop\\AirPassengers.csv", source = "com.databricks.spark.csv",header="true") 
> head(people) 
    Sl_No  time AirPassengers 
1  1  1949   112 
2  2 1949.083333   118 
3  3 1949.166667   132 
4  4  1949.25   129 
5  5 1949.333333   121 
6  6 1949.416667   135 
>