2017-09-15 107 views
1

我试图在R(h2o_3.14.0.2)中运行H2O的异常检测。如何在H2O-R中创建异常检测模型

首先,我试图用我的主深度学习模型,并得到了错误:

water.exceptions.H2OIllegalArgumentException 
[1] "water.exceptions.H2OIllegalArgumentException: Only for AutoEncoder Deep Learning model." 
... 

OK,我的坏。我已经设置autoencoderTRUE

h2o.deeplearning(y = response, training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE) 

,并获得新的错误:

Error in .verify_dataxy(training_frame, x, y, autoencoder): `y` should not be specified for autoencoder=TRUE, remove `y` input 
Traceback: 

1. h2o.deeplearning(y = response, training_frame = training.frame, 
.  validation_frame = test.frame, autoencoder = TRUE) 
2. .verify_dataxy(training_frame, x, y, autoencoder) 
3. stop("`y` should not be specified for autoencoder=TRUE, remove `y` input") 

OK,所以我应该已经删除y

h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, autoencoder = TRUE) 

但是:

Error in is.numeric(y): argument "y" is missing, with no default 
Traceback: 

1. h2o.deeplearning(training_frame = training.frame, validation_frame = test.frame, 
.  autoencoder = TRUE) 
2. is.numeric(y) 

嗯,最后两个要求看起来相互排斥。但是OK,我会尝试另一种模式:

anomaly.detection.model <- h2o.glrm(training_frame = training.frame, k = 10, seed = common.seed) 

h2o.anomaly(anomaly.detection.model, training.frame, per_feature = FALSE) 

并获得另一种类型的错误:

java.lang.AssertionError 
[1] "java.lang.AssertionError"                      
[2] " water.api.ModelMetricsHandler.predict(ModelMetricsHandler.java:439)" 
... 

失败的断言是assert s.reconstruct_train;。还没有挖掘它。也许我会运气与GBM或RF?

model = h2o.gbm(y = response, 
       training_frame = training.frame, 
       validation_frame = validation.frame, 
       max_hit_ratio_k = 10, 
       seed = common.seed, 
       stopping_rounds = 3, 
       stopping_tolerance = 1e-2) 

h2o.anomaly(model, training.frame, per_feature = FALSE) 

water.exceptions.H2OIllegalArgumentException 
[1] "water.exceptions.H2OIllegalArgumentException: Requires a Deep Learning, GLRM, DRF or GBM model." 

与同为RF。

所以我有两个问题:

  1. 如何检测异常?
  2. 这些是错误还是我做错了什么?

回答

0

启用autoencoder(如真)变成聚类问题,因此不需要设置响应(y)。

此外,当autoencoder设置为TRUE时,您仍然需要设置x。上面用autoencoder看到的问题是TRUE,你没有设置预测器(x)。一旦你设置了x,你的问题就会消失。

下面是我用H2O 3.14.0.2 R上运行快速异常检测测试(详情请参阅这篇blog):

> library(h2o) 
    > h2o.init() 
    Reading in config file: ./.h2oconfig 

    H2O is not running yet, starting it now... 

    Note: In case of errors look at the following log files: 
     /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.out 
     /var/folders/x7/331tvwcd6p17jj9zdmhnkpyc0000gn/T//Rtmp7RuYKp/h2o_avkashchauhan_started_from_r.err 

    java version "1.8.0_101" 
    Java(TM) SE Runtime Environment (build 1.8.0_101-b13) 
    Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) 

    Starting H2O JVM and connecting: .. Connection successful! 

    R is connected to the H2O cluster: 
     H2O cluster uptime:   1 seconds 948 milliseconds 
     H2O cluster version:  3.14.0.2 
     H2O cluster version age: 24 days 
     H2O cluster name:   H2O_started_from_R_avkashchauhan_alj381 
     H2O cluster total nodes: 1 
     H2O cluster total memory: 3.56 GB 
     H2O cluster total cores: 8 
     H2O cluster allowed cores: 8 
     H2O cluster healthy:  TRUE 
     H2O Connection ip:   localhost 
     H2O Connection port:  54321 
     H2O Connection proxy:  NA 
     H2O Internal Security:  FALSE 
     H2O API Extensions:   XGBoost, Algos, AutoML, Core V3, Core V4 
     R Version:     R version 3.4.0 (2017-04-21) 

    > mtcar = h2o.importFile('https://raw.githubusercontent.com/woobe/H2O_London_Workshop/master/data/auto_design.csv') 
    |==================================================================================================================================| 100% 
    > mtcar$gear = as.factor(mtcar$gear) 
    > mtcar$carb = as.factor(mtcar$carb) 
    > mtcar$cyl = as.factor(mtcar$cyl) 
    > mtcar$vs = as.factor(mtcar$vs) 
    > mtcar$am = as.factor(mtcar$am) 
    > mtcar.dl = h2o.deeplearning(x = 2:12, training_frame = mtcar, autoencoder = TRUE, hidden = c(1,1,1), epochs = 100,seed=1) 
    |==================================================================================================================================| 100% 
    > errors <- h2o.anomaly(mtcar.dl, mtcar, per_feature = TRUE) 
    > print(errors) 
    reconstr_carb.1.SE reconstr_carb.2.SE reconstr_carb.3.SE reconstr_carb.4.SE reconstr_carb.6.SE reconstr_carb.8.SE 
    1     0     0     0     1     0     0 
    2     0     0     0     1     0     0 
    3     1     0     0     0     0     0 
    4     1     0     0     0     0     0 
    5     0     1     0     0     0     0 
    6     1     0     0     0     0     0 
    reconstr_carb.missing(NA).SE reconstr_cyl.4.SE reconstr_cyl.6.SE reconstr_cyl.8.SE reconstr_cyl.10.SE reconstr_cyl.missing(NA).SE 
    1       0     0     1     0     0       0 
    2       0     0     1     0     0       0 
    3       0     1     0     0     0       0 
    4       0     0     1     0     0       0 
    5       0     0     0     1     0       0 
    6       0     0     1     0     0       0 
    reconstr_gear.3.SE reconstr_gear.4.SE reconstr_gear.5.SE reconstr_gear.missing(NA).SE reconstr_vs.0.SE reconstr_vs.1.SE 
    1     0     1     0       0    1    0 
    2     0     1     0       0    1    0 
    3     0     1     0       0    0    1 
    4     1     0     0       0    0    1 
    5     1     0     0       0    1    0 
    6     1     0     0       0    0    1 
    reconstr_vs.missing(NA).SE reconstr_am.0.SE reconstr_am.1.SE reconstr_am.missing(NA).SE reconstr_mpg.SE reconstr_disp.SE reconstr_hp.SE 
    1       0    0    1       0 8.705556e-05  0.0196626269 0.0035177471 
    2       0    0    1       0 8.705556e-05  0.0196626269 0.0035177471 
    3       0    0    1       0 2.684331e-04  0.0411916382 0.0045768080 
    4       0    1    0       0 1.307597e-05  0.0004837585 0.0035177471 
    5       0    1    0       0 1.779785e-03  0.0102131519 0.0007516691 
    6       0    1    0       0 2.576469e-03  0.0038200199 0.0038147898 
    reconstr_drat.SE reconstr_wt.SE reconstr_qsec.SE 
    1  0.002147682 0.002080628  0.003914459 
    2  0.002147682 0.002054817  0.003843678 
    3  0.002153499 0.002111200  0.003646228 
    4  0.002244072 0.002020654  0.003545225 
    5  0.002235761 0.001998203  0.003843678 
    6  0.002282261 0.001996213  0.003451600 

    [32 rows x 28 columns] 

你也可以做GLRM对同一数据集如下,你必须设置k,并且不需要将GL传递给GLRM,但是数据集不能有恒定的列。这就是为什么我在深度学习中使用GLRM过滤的数据集。

> mtcar_glrm = mtcar[2:12] 
> mtcar.glrm = h2o.glrm(training_frame = mtcar_glrm,seed=1, k = 5) 
+0

谢谢! 虽然错误消息应该更具描述性。 –

1

我试图自己检测时间序列数据的异常。要学习我使用这个概念blog。这个博客中的解释对我很好。

我希望能够提供一些视觉表示,当我们检测到异常时,会发生什么。 在此示例中,Deep Learning模型适合于此ECG数据集。数据看起来身体像这样:

Data we fit our Deep Learning Model

之后,我们提供的测试数据集(包含异常),这将是这样的: Data we test our Deep Learning Model on

异常检测本身就是在可能的情况“人工智能”看到方误差差使用公制MSE或平均

This is what AI 'see' on Test dataset

生成的MSE可以b e如示例

MSE output