"Failed to compute BIC gain" error from Machine Learning

Elasticsearch version: 5.5.0
Plugins installed: [x-pack]
JVM version: 1.8.0_141
OS version: Ubuntu Server 16.04

We're trialing the machine learning capabilities of Elasticsearch in Kibana.

Our test job json is here.

I'm seeing regular log error messages like the one below, should I be concerned ?

[2017-07-28T10:37:33,573][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [test_multi_1] [autodetect/1404] [CXMeansOnline1d.cc@403] Failed to compute BIC gain: Error in function boost::math::lgamma
(double): Evaluation of lgamma at 0., n = 108.739, m = -8.7601e+018, v = 3.76441e+022, wl = 0.855782, ml = -8.7601e+018, vl = 1, wr = 0.144218, mr = -8.7601e+018, vr = 1.11544e+006 | repeated [3]

Hi Tommy.

This indicates that we've failed to calculate a quantity, because of numerical precision issues, we use to decide whether to create multiple clusters for the data. In this case we will be "cautious" and choose not create multiple clusters. This may well be the correct decision anyway given the data characteristics. It shouldn't mean that the modelling is significantly impaired, i.e. we should still have valid models with which we are able to detect valid anomalies. With this error message I should be able to have a good chance of reproducing the issue and fixing the instability in our code.

Interestingly, the values printed suggest that the input data has a very large range, specifically values as negative as -1e18. I notice that some of your detectors are running metric functions on what are described as hashes of quantities. If these are unsigned 64 bit integer hashes then you may be running into overflow storing them in elasticsearch (whose integer type is signed). [Also, I'm wondering whether you expect these values to be confined to some interval and are interested when they outside that interval. If the hash is uniform over the whole range I'm not sure anomaly detection on the mean value is useful. If you are interested in say hashes becoming less diverse then a better measure would be to look at the variation you see in the hash values using our varp function.]

Thanks Tom,

You've hit the nail on the head, we're experimenting to try and perform anomoly detection ingested text data by storing length and hash (murmur3) for several fields per document.

I would bet that in some cases the hash is overflowing and your point about hashes becoming less or more diverse is correct.

Am I correct in thinking that I would need to use an 'advanced' job to make use of varp function ?

No problem. Yes, currently you can only select the varp function in the advanced job configuration.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.