预测评估失败，文本分类模板

我试图预测基于predictionio上其他文本字段的文本字段。我用this指南作参考。我创建使用预测评估失败，文本分类模板

pio app new MyTextApp

一个新的应用和使用模板中提供的数据源遵循的指导高达评价。这一切都没问题，直到评估。在评估数据源时，我在下面粘贴错误。

[INFO] [CoreWorkflow$] runEvaluation started 
[WARN] [Utils] Your hostname, my-ThinkCentre-Edge72 resolves to a loopback address: 127.0.0.1; using 192.168.65.27 instead (on interface eth0) 
[WARN] [Utils] Set SPARK_LOCAL_IP if you need to bind to another address 
[INFO] [Remoting] Starting remoting 
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDr[email protected]:59649] 
[INFO] [CoreWorkflow$] Starting evaluation instance ID: AU29p8j3Fkwdnkfum_ke 
[INFO] [Engine$] DataSource: [email protected] 
[INFO] [Engine$] Preparator: [email protected] 
[INFO] [Engine$] AlgorithmList: List([email protected]) 
[INFO] [Engine$] Serving: [email protected] 
Exception in thread "main" java.lang.UnsupportedOperationException: empty.maxBy 
at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:223) 
at scala.collection.AbstractTraversable.maxBy(Traversable.scala:105) 
at org.template.textclassification.PreparedData.<init>(Preparator.scala:152) 
at org.template.textclassification.Preparator.prepare(Preparator.scala:38) 
at org.template.textclassification.Preparator.prepare(Preparator.scala:34)

我必须编辑任何配置文件才能使其工作吗？我已经成功地对movielens数据进行了测试。

来源

2015-06-04 cutteeth

因此，当通过DataSource类未正确读取数据时，会出现此特定错误消息。如果您使用的是不同的文本数据集，请确保您正确反映了readEventData方法中eventNames，entityType和各自属性字段名称的任何更改。

maxBy方法用于提取具有最多观察值的类。如果标签Map的类别为空，则意味着没有类别被记录，这基本上告诉您没有数据被输入。

例如，我刚刚使用此引擎做了垃圾邮件检测器。我的电子邮件数据的形式为：

{"entityType": "content", "eventTime": "2015-06-04T00:22:39.064+0000", "entityId": 1, "event": "e-mail", "properties": {"label": "spam", "text": "content"}}

要使用的引擎这个数据我做的DataSource类以下变化：

entityType = Some("source"), // specify data entity type eventNames = Some(List("documents")) // specify data event name

变化

entityType = Some("content"), // specify data entity type eventNames = Some(List("e-mail")) // specify data event name

and

个

)(sc).map(e => Observation(
    e.properties.get[Double]("label"), 
    e.properties.get[String]("text"), 
    e.properties.get[String]("category") 
)).cache

变化：

)(sc).map(e => { 
    val label = e.properties.get[String]("label") 


    Observation(
    if (label == "spam") 1.0 else 0.0, 
    e.properties.get[String]("text"), 
    label 
) 
}).cache

在此之后，我能够经过建设，培训和部署，以及评估。

来源

2015-06-04 17:29:29

感谢您的信息。我为不同的数据集使用了相同的应用程序。我删除了现有的应用程序，数据并创建了新的应用程序，然后运行pio构建，培训和部署。现在它工作正常。 :) – cutteeth

真棒，我很高兴的回应帮助！我刚刚发布了一个新版本的引擎，其中包含一个完整性检查，以确保训练数据实际上被馈入。PreparedClass也被修改，以便文本向量化处理更快。 –

我已经下载了最新的文本分类模板（2.0），同样的问题也在最近的更新中。评估失败，错误为'java.lang.UnsupportedOperationException：empty.maxBy'，并且训练失败，发生'io.prediction.data.storage.DataMapException：字段标签是必需的。'pio说spark地址绑定到loopback。我必须将其更改为公共IP吗？你也可以请解释文本矢量化？ – cutteeth

预测评估失败，文本分类模板

回答

相关问题