斯卡拉拼合列表的嵌入式列表

我已经创建了一个Twitter数据流，显示标签，作者和提到的用户以下面的格式。斯卡拉拼合列表的嵌入式列表

(List(timetofly, hellocake),Shera_Eyra,List(blxcknicotine, kimtheskimm))

我不能这样做，因为嵌入式名单分析这种格式。我如何创建另一个以这种格式显示数据的数据流？

timetofly, Shera_Eyra, blxcknicotine timetofly, Shera_Eyra, kimtheskimm hellocake, Shera_Eyra, blxcknicotine hellocake, Shera_Eyra, kimtheskimm

这里是我的代码来生成数据：

val sparkConf = new SparkConf().setAppName("TwitterPopularTags") 
val ssc = new StreamingContext(sparkConf, Seconds(sampleInterval)) 
val stream = TwitterUtils.createStream(ssc, None) 
val data = stream.map {line => 
     (line.getHashtagEntities.map(_.getText), 
     line.getUser().getScreenName(), 
     line.getUserMentionEntities.map(_.getScreenName).toList) 
    }

来源

2017-07-15 user8312833

所以你有'列表[字符串]，字符串，列表[字符串]“的'Tuple3'？什么是你的输出的理想类型？ – user4601931

我想看到这是一个字符串列表 – user8312833

我会用修真此：

val data = (List("timetofly", "hellocake"), "Shera_Eyra", List("blxcknicotine", "kimtheskimm")) 

    val result = for { 
    hashtag <- data._1 
    user = data._2 
    mentionedUser <- data._3 
    } yield (hashtag, user, mentionedUser) 

    result.foreach(println)

输出：

(timetofly,Shera_Eyra,blxcknicotine) 
(timetofly,Shera_Eyra,kimtheskimm) 
(hellocake,Shera_Eyra,blxcknicotine) 
(hellocake,Shera_Eyra,kimtheskimm)

如果你宁愿序列串的名单，而不是序列串的元组，然后换产给你，而不是一个列表：yield List(hashtag, user, mentionedUser)

来源

2017-07-15 17:46:24 Tom

我尝试使用该代码，但我得到这个错误值_1不是org.apache.spark.streaming.dstream.DStream [（Array [String]，字符串，列表[字符串]）]' – user8312833

如果你的数据是一个DStream的形式，你可以'映射'它 – Tom

在您的代码段，data是一个DStream[(Array[String], String, List[String])]。要在你想要的格式获得DStream[String]，你可以使用flatMap和map：

val data = stream.map { line => 
    (line.getHashtagEntities.map(_.getText), 
    line.getUser().getScreenName(), 
    line.getUserMentionEntities.map(_.getScreenName).toList) 
} 

val data2 = data.flatMap(a => a._1.flatMap(b => a._3.map(c => (b, a._2, c)))) 
       .map { case (hash, user, mention) => s"$hash, $user, $mention" }

的flatMap导致DStream[(String, String, String)]，其中每个元组由一个哈希标签实体，用户和提实体。随后调用map以及模式匹配将创建一个DStream[String]，其中每个String由每个元组中的元素组成，用逗号和空格分隔。

来源

2017-07-15 23:05:42 chunjef

很好的答案upvote从我:) –

斯卡拉拼合列表的嵌入式列表

回答

相关问题