警报使用Azure的见解和/或分析差错率超过阈值的

我送customEvents到Azure应用程序的见解，是这样的：警报使用Azure的见解和/或分析差错率超过阈值的

timestamp     | name   | customDimensions 
---------------------------------------------------------------------------- 
2017-06-22T14:10:07.391Z | StatusChange | {"Status":"3000","Id":"49315"} 
2017-06-22T14:10:14.699Z | StatusChange | {"Status":"3000","Id":"49315"} 
2017-06-22T14:10:15.716Z | StatusChange | {"Status":"2000","Id":"49315"} 
2017-06-22T14:10:21.164Z | StatusChange | {"Status":"1000","Id":"41986"} 
2017-06-22T14:10:24.994Z | StatusChange | {"Status":"3000","Id":"41986"} 
2017-06-22T14:10:25.604Z | StatusChange | {"Status":"2000","Id":"41986"} 
2017-06-22T14:10:29.964Z | StatusChange | {"Status":"3000","Id":"54234"} 
2017-06-22T14:10:35.192Z | StatusChange | {"Status":"2000","Id":"54234"} 
2017-06-22T14:10:35.809Z | StatusChange | {"Status":"3000","Id":"54234"} 
2017-06-22T14:10:39.22Z | StatusChange | {"Status":"1000","Id":"74458"}

假设状态3000为错误状态，我想在过去一小时内某个百分比的Ids最终处于错误状态时收到警报。

据我所知，Insights默认情况下不能这样做，所以我想尝试approach described here来编写可能触发警报的Analytics（分析）查询。这是我已经能够拿出最好的：

customEvents 
| where timestamp > ago(1h) 
| extend isError = iff(toint(customDimensions.Status) == 3000, 1, 0) 
| summarize failures = sum(isError), successes = sum(1 - isError) by timestamp bin = 1h 
| extend ratio = todouble(failures)/todouble(failures+successes) 
| extend failure_Percent = ratio * 100 
| project iff(failure_Percent < 50, "PASSED", "FAILED")

但是，我警告正常工作，查询应：

返回“通过”即使没有活动在一小时内（另一个警报将照顾没有事件）
只在每小时内考虑每个Id的最终状态。

由于写入请求，如果没有事件，查询既不返回“PASSED”也不返回“FAILED”。

它还考虑到任何记录与Status == 3000，这意味着，上面的例子将返回“失败”（5超过了10记录具有状态3000），而在现实中只是出于1 4个IDS在结束了错误状态。

有人可以帮我找出正确的查询吗？

（和可选的辅助问题：有没有人安装使用洞察的类似警告这是一个正确的做法？）

来源

2017-06-28 madd0

如上所述，由于您只是在一个小时查询，您不需要将timestamp装箱，或者将其作为您的聚合的一部分。
回答您的问题：

的方式在所有克服没有数据将注入合成一行到你的餐桌，这将转化为一个成功的结果，如果没有其它结果发现
如果你想您的通过/失败标准基于每个ID的最终状态，那么您需要在summarize中使用argmax - 它将返回对应于最大时间戳的状态。

所以包裹这一切：

customEvents 
| where timestamp > ago(1h) 
| extend isError = iff(toint(customDimensions.Status) == 3000, 1, 0) 
| summarize argmax(timestamp, isError) by tostring(customDimensions.Id) 
| summarize failures = sum(max_timestamp_isError), successes = sum(1 - max_timestamp_isError) 
| extend ratio = todouble(failures)/todouble(failures+successes) 
| extend failure_Percent = ratio * 100 
| project Result = iff(failure_Percent < 50, "PASSED", "FAILED"), IsSynthetic = 0 
| union (datatable(Result:string, IsSynthetic:long) ["PASSED", 1]) 
| top 1 by IsSynthetic asc 
| project Result

关于奖金的问题 - 你可以设置报警基于使用流量分析查询。查询here有关的问题

来源

2017-06-29 07:17:14 EranG

为了将来的参考，如果我不'bin'的时间戳，查询总是返回一些东西（因为总和返回0），所以综合结果没有帮助。因此，要么保持垃圾箱，要么，正如我最终所做的那样，如果失败==成功== 0，则返回“PASSED”。无论如何，谢谢你的好解释 – madd0

我假定的查询，如果您在小时没有数据返回任何行，因为timestamp bin = 1h（又名bin(timestamp,1h)）不返回任何垃圾箱？

但如果你只是查询最后一个小时，我认为你根本不需要时间戳上的bin？

，而无需您的数据很难精确瑞普但是...你可以尝试像（注意语法错误）：

customEvents 
| where timestamp > ago(1h) 
| extend isError = iff(toint(customDimensions.Status) == 3000, 1, 0) 
| summarize totalCount = count(), failures = countif(isError == 1), successes = countif(isError ==0) 
| extend ratio = iff(totalCount == 0, 0, todouble(failures)/todouble(failures+successes)) 
| extend failure_Percent = ratio * 100 
| project iff(failure_Percent < 50, "PASSED", "FAILED")

假设，摆脱小时分级的应该只是给你回单在这里排

totalCount = 0，失败= 0，成功= 0，所以失败百分比的数学应该让你回0失败率，这应该让你“通过”。

而不是尝试它我不确定这是否工作，或如果没有数据仍然返回你没有行？

关于第二个问题，你可以使用类似

let maxTimestamp = toscalar(customEvents where timestamp > ago(1h) 
| summarize max(timestamp)); 
customEvents | where timestamp == maxTimestamp ... 
// ... more query here

得到公正行（S）已经有一个小时内，最后事件的时间戳？

来源

2017-06-28 21:25:26

警报使用Azure的见解和/或分析差错率超过阈值的

回答

相关问题