从MS-SQL环境进入一个也具有火花访问的HIVE环境。正确地尝试使用RStudio和R(有时使用rPython的python)来替换我曾经使用过的T-SQL以及我以前从未做过的所有事情。R DBI Sparklyr DBWritetable无结果运行
为了这个工作,我需要能够读取和写回HIVE数据库。
我已经使用火花和将R包sparklyr连接,并且可以使用R包DBI与火花连接连接到我们的HIVE簇和拉数据转换为R dataframes就好:
sc <- spark_connect(master = "yarn-client", spark_home="/usr/hdp/current/spark-client", config = config)
result3 <- dbGetQuery(sc, "select * from sampledb.sampletable limit 100")
上面的代码作品每次。我也可以使用dbGetQuery在引用的sql语句的上下文中在数据库中创建表,而不会出现问题,因此它不是写权限问题。
然而,当我尝试从R帧数据写回蜂巢星团,像这样:
dbWriteTable(conn = sc, name = "sampledb.rsparktest3", value = result3)
它运行没有错误,但该表显示不出来,我不能查询。
如果我再次尝试写表我得到这个错误:
> dbWriteTable(conn = sc, name = "sampledb.rsparktest3", value = result3)
Error in .local(conn, name, value, ...) :
Table sampledb.rsparktest3 already exists
任何想法可能是这样吗?除了DBI,还有更好的方法吗?
在此先感谢您的帮助!
下面是当我运行这些语句整个RStudio控制台日志:
> result3 <- dbGetQuery(sc, "select * from sampledb.sampletable limit 100")
> dbWriteTable(conn = sc, name = "sampledb.rsparktest3", value = result3)
> result3y <- dbGetQuery(sc, "select * from sampledb.rsparktest3 limit 2")
Error: org.apache.spark.sql.AnalysisException: Table not found: sampledb.rsparktest3; line 1 pos 35
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sparklyr.Invoke$.invoke(invoke.scala:102)
at sparklyr.StreamHandler$.handleMethodCall(stream.scala:97)
at sparklyr.StreamHandler$.read(stream.scala:62)
at sparklyr.BackendHandler.channelRead0(handler.scala:52)
at sparklyr.BackendHandler.channelRead0(handler.scala:14)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
> dbWriteTable(conn = sc, name = "sampledb.rsparktest3", value = result3)
Error in .local(conn, name, value, ...) :
Table sampledb.rsparktest3 already exists
感谢埃德加,但是当我尝试使用spark_write_table它不会接受R数据帧和犯规认识我的火花dataframes。 > spark_write_table(spark_iris,spark_iris2) 错误UseMethod( “spark_write_table”): 关于 'spark_write_table' 施加到 类 “data.frame” – wlf211
您好对象没有适用的方法,该呼叫应该是这样的: 'spark_write_table(spark_iris,“hive_iris”)'。第一个参数'x'应该是一个Spark DF,第二个参数是Hive中的表名 – edgararuiz