如何在H2O-R中创建异常检测模型

我试图在R（h2o_3.14.0.2）中运行H2O的异常检测。如何在H2O-R中创建异常检测模型

首先，我试图用我的主深度学习模型，并得到了错误：

water.exceptions.H2OIllegalArgumentException 
[1] "water.exceptions.H2OIllegalArgumentException: Only for AutoEncoder Deep Learning model." 
...

OK，我的坏。我已经设置autoencoder到TRUE：

h2o.deeplearning(y = response, training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)

，并获得新的错误：

Error in .verify_dataxy(training_frame, x, y, autoencoder): `y` should not be specified for autoencoder=TRUE, remove `y` input 
Traceback: 

1. h2o.deeplearning(y = response, training_frame = training.frame, 
.  validation_frame = test.frame, autoencoder = TRUE) 
2. .verify_dataxy(training_frame, x, y, autoencoder) 
3. stop("`y` should not be specified for autoencoder=TRUE, remove `y` input")

OK，所以我应该已经删除y：

h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE)

但是：

Error in is.numeric(y): argument "y" is missing, with no default 
Traceback: 

1. h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, 
.  autoencoder = TRUE) 
2. is.numeric(y)

嗯，最后两个要求看起来相互排斥。但是OK，我会尝试另一种模式：

anomaly.detection.model <- h2o.glrm(training_frame = training.frame, k = 10, seed = common.seed) 

h2o.anomaly(anomaly.detection.model, training.frame, per_feature = FALSE)

并获得另一种类型的错误：

java.lang.AssertionError 
[1] "java.lang.AssertionError"                      
[2] " water.api.ModelMetricsHandler.predict(ModelMetricsHandler.java:439)" 
...

失败的断言是assert s.reconstruct_train;。还没有挖掘它。也许我会运气与GBM或RF？

model = h2o.gbm(y = response, 
       training_frame = training.frame, 
       validation_frame = validation.frame, 
       max_hit_ratio_k = 10, 
       seed = common.seed, 
       stopping_rounds = 3, 
       stopping_tolerance = 1e-2) 

h2o.anomaly(model, training.frame, per_feature = FALSE) 

water.exceptions.H2OIllegalArgumentException 
[1] "water.exceptions.H2OIllegalArgumentException: Requires a Deep Learning, GLRM, DRF or GBM model."

与同为RF。

所以我有两个问题：

如何检测异常？
这些是错误还是我做错了什么？

来源

2017-09-15 Igor Melnichenko

启用autoencoder（如真）变成聚类问题，因此不需要设置响应（y）。

此外，当autoencoder设置为TRUE时，您仍然需要设置x。上面用autoencoder看到的问题是TRUE，你没有设置预测器（x）。一旦你设置了x，你的问题就会消失。

下面是我用H2O 3.14.0.2 R上运行快速异常检测测试（详情请参阅这篇blog）：

> library(h2o) 
    > h2o.init() 
    Reading in config file: ./.h2oconfig 

    H2O is not running yet, starting it now... 

    Note: In case of errors look at the following log files: 
     /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.out 
     /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.err 

    java version "1.8.0_101" 
    Java(TM) SE Runtime Environment (build 1.8.0_101-b13) 
    Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) 

    Starting H2O JVM and connecting: .. Connection successful! 

    R is connected to the H2O cluster: 
     H2O cluster uptime:   1 seconds 948 milliseconds 
     H2O cluster version:  3.14.0.2 
     H2O cluster version age: 24 days 
     H2O cluster name:   H2O_started_from_R_avkashchauhan_alj381 
     H2O cluster total nodes: 1 
     H2O cluster total memory: 3.56 GB 
     H2O cluster total cores: 8 
     H2O cluster allowed cores: 8 
     H2O cluster healthy:  TRUE 
     H2O Connection ip:   localhost 
     H2O Connection port:  54321 
     H2O Connection proxy:  NA 
     H2O Internal Security:  FALSE 
     H2O API Extensions:   XGBoost, Algos, AutoML, Core V3, Core V4 
     R Version:     R version 3.4.0 (2017-04-21) 

    > mtcar = h2o.importFile('https://raw.githubusercontent.com/woobe/H2O_London_Workshop/master/data/auto_design.csv') 
    |==================================================================================================================================| 100% 
    > mtcar$gear = as.factor(mtcar$gear) 
    > mtcar$carb = as.factor(mtcar$carb) 
    > mtcar$cyl = as.factor(mtcar$cyl) 
    > mtcar$vs = as.factor(mtcar$vs) 
    > mtcar$am = as.factor(mtcar$am) 
    > mtcar.dl = h2o.deeplearning(x = 2:12, training_frame = mtcar, autoencoder = TRUE, hidden = c(1,1,1), epochs = 100,seed=1) 
    |==================================================================================================================================| 100% 
    > errors <- h2o.anomaly(mtcar.dl, mtcar, per_feature = TRUE) 
    > print(errors) 
    reconstr_carb.1.SE reconstr_carb.2.SE reconstr_carb.3.SE reconstr_carb.4.SE reconstr_carb.6.SE reconstr_carb.8.SE 
    1     0     0     0     1     0     0 
    2     0     0     0     1     0     0 
    3     1     0     0     0     0     0 
    4     1     0     0     0     0     0 
    5     0     1     0     0     0     0 
    6     1     0     0     0     0     0 
    reconstr_carb.missing(NA).SE reconstr_cyl.4.SE reconstr_cyl.6.SE reconstr_cyl.8.SE reconstr_cyl.10.SE reconstr_cyl.missing(NA).SE 
    1       0     0     1     0     0       0 
    2       0     0     1     0     0       0 
    3       0     1     0     0     0       0 
    4       0     0     1     0     0       0 
    5       0     0     0     1     0       0 
    6       0     0     1     0     0       0 
    reconstr_gear.3.SE reconstr_gear.4.SE reconstr_gear.5.SE reconstr_gear.missing(NA).SE reconstr_vs.0.SE reconstr_vs.1.SE 
    1     0     1     0       0    1    0 
    2     0     1     0       0    1    0 
    3     0     1     0       0    0    1 
    4     1     0     0       0    0    1 
    5     1     0     0       0    1    0 
    6     1     0     0       0    0    1 
    reconstr_vs.missing(NA).SE reconstr_am.0.SE reconstr_am.1.SE reconstr_am.missing(NA).SE reconstr_mpg.SE reconstr_disp.SE reconstr_hp.SE 
    1       0    0    1       0 8.705556e-05  0.0196626269 0.0035177471 
    2       0    0    1       0 8.705556e-05  0.0196626269 0.0035177471 
    3       0    0    1       0 2.684331e-04  0.0411916382 0.0045768080 
    4       0    1    0       0 1.307597e-05  0.0004837585 0.0035177471 
    5       0    1    0       0 1.779785e-03  0.0102131519 0.0007516691 
    6       0    1    0       0 2.576469e-03  0.0038200199 0.0038147898 
    reconstr_drat.SE reconstr_wt.SE reconstr_qsec.SE 
    1  0.002147682 0.002080628  0.003914459 
    2  0.002147682 0.002054817  0.003843678 
    3  0.002153499 0.002111200  0.003646228 
    4  0.002244072 0.002020654  0.003545225 
    5  0.002235761 0.001998203  0.003843678 
    6  0.002282261 0.001996213  0.003451600 

    [32 rows x 28 columns]

你也可以做GLRM对同一数据集如下，你必须设置k，并且不需要将GL传递给GLRM，但是数据集不能有恒定的列。这就是为什么我在深度学习中使用GLRM过滤的数据集。

> mtcar_glrm = mtcar[2:12] 
> mtcar.glrm = h2o.glrm(training_frame = mtcar_glrm,seed=1, k = 5)

来源

2017-09-15 16:44:42 AvkashChauhan

谢谢！虽然错误消息应该更具描述性。 –

我试图自己检测时间序列数据的异常。要学习我使用这个概念blog。这个博客中的解释对我很好。

我希望能够提供一些视觉表示，当我们检测到异常时，会发生什么。在此示例中，Deep Learning模型适合于此ECG数据集。数据看起来身体像这样：

Data we fit our Deep Learning Model

之后，我们提供的测试数据集（包含异常），这将是这样的： Data we test our Deep Learning Model on

异常检测本身就是在可能的情况“人工智能”看到方误差差使用公制MSE或平均

This is what AI 'see' on Test dataset

生成的MSE可以b e如示例

MSE output

来源

2017-11-12 16:44:26 vlad1490

如何在H2O-R中创建异常检测模型

回答

相关问题