2017-08-07 47 views
0

我有一个函数isJSON()返回类型列的比较。斯卡拉 - 单元测试列类型函数

def isJSON(element: Column): Column = { 
    element.contains("{") && element.contains("}") 
    } 

这是我如何使用它通常和它按预期工作:

df.withColumn("is_json", isJSON(col("data"))) 

我想写使用FunSpec一个单元测试,但我不能断言在Column型数据。

describe("isJSON()") { 
    it("should return false if data is not JSON") { 
    val df = Seq("Not a JSON").toDF("data") 
    assert(isJSON(df("data")).equals(lit(false))) 
    } 
} 

单元测试出来的错误与下面的堆栈跟踪:

ScalaTestFailureLocation: com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1 at (DatalakeFunSpecTest.scala:29) 
org.scalatest.exceptions.TestFailedException: datalake.this.`package`.isJSON(df.apply("data")).equals(org.apache.spark.sql.functions.lit(false)) was false 
    at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) 
    at org.scalatest.FunSpec.newAssertionFailedException(FunSpec.scala:1626) 
    at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) 
    at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(DatalakeFunSpecTest.scala:29) 
    at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(DatalakeFunSpecTest.scala:23) 
    at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(DatalakeFunSpecTest.scala:23) 
    at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) 
    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) 
    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) 
    at org.scalatest.Transformer.apply(Transformer.scala:22) 
    at org.scalatest.Transformer.apply(Transformer.scala:20) 
    at org.scalatest.FunSpecLike$$anon$1.apply(FunSpecLike.scala:422) 
    at org.scalatest.Suite$class.withFixture(Suite.scala:1122) 
    at org.scalatest.FunSpec.withFixture(FunSpec.scala:1626) 
    at org.scalatest.FunSpecLike$class.invokeWithFixture$1(FunSpecLike.scala:419) 
    at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431) 
    at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431) 
    at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) 
    at org.scalatest.FunSpecLike$class.runTest(FunSpecLike.scala:431) 
    at com.mhedu.common.datalake.DatalakeFunSpecTest.org$scalatest$BeforeAndAfter$$super$runTest(DatalakeFunSpecTest.scala:13) 
    at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) 
    at com.mhedu.common.datalake.DatalakeFunSpecTest.runTest(DatalakeFunSpecTest.scala:13) 
    at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:464) 
    at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:464) 
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) 
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) 
    at scala.collection.immutable.List.foreach(List.scala:381) 
    at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) 
    at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390) 
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427) 
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) 
    at scala.collection.immutable.List.foreach(List.scala:381) 
    at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) 
    at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) 
    at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) 
    at org.scalatest.FunSpecLike$class.runTests(FunSpecLike.scala:464) 
    at org.scalatest.FunSpec.runTests(FunSpec.scala:1626) 
    at org.scalatest.Suite$class.run(Suite.scala:1424) 
    at org.scalatest.FunSpec.org$scalatest$FunSpecLike$$super$run(FunSpec.scala:1626) 
    at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:468) 
    at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:468) 
    at org.scalatest.SuperEngine.runImpl(Engine.scala:545) 
    at org.scalatest.FunSpecLike$class.run(FunSpecLike.scala:468) 
    at com.mhedu.common.datalake.DatalakeFunSpecTest.org$scalatest$BeforeAndAfter$$super$run(DatalakeFunSpecTest.scala:13) 
    at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) 
    at com.mhedu.common.datalake.DatalakeFunSpecTest.run(DatalakeFunSpecTest.scala:13) 
    at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) 
    at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563) 
    at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557) 
    at scala.collection.immutable.List.foreach(List.scala:381) 
    at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557) 
    at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044) 
    at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043) 
    at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722) 
    at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043) 
    at org.scalatest.tools.Runner$.run(Runner.scala:883) 
    at org.scalatest.tools.Runner.run(Runner.scala) 
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138) 
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28) 

有什么办法,我可以写Column类型的断言或以某种方式在布尔提取塔的原始值做比较?

回答

0

您正在测试两个Column实例的相等性;这些实例不等于 - 如果应用于DF,它们会产生相同的结果,但它们不相等(将它们应用到不同的DF并获得不同的结果很容易)。

测试这将是filter与这两个Column S的条件(的isJSONlit(true)结果)数据帧相等,然后断言结果的大小是的一种方法0:

describe("isJSON()") { 
    it("should return false if data is not JSON") { 
    val df = Seq("Not a JSON").toDF("data") 
    assert(df.filter(isJSON(df("data")) === lit(true)).count() == 0) 
    } 
} 

另一种办法是收集计算此列,并断言所有的结果都是false,如结果:

describe("isJSON()") { 
    it("should return false if data is not JSON") { 
    val df = Seq("Not a JSON").toDF("data") 
    val results: Array[Boolean] = df.select(isJSON(df("data"))).collect().map { case Row(b: Boolean) => b } 
    assert(results sameElements Array(false)) 
    } 
} 

还有许多其他的simila r选项,这里的重要概念是比较数据而不是Column对象 - 只要assert表达式中的比较类型是列,则不会比较实际结果。