Custom metric data visualisation

Dear team,

I have a problem I would like to discuss with you; let me explain.

Data description

  • I have an application which ingest data read from some sensors.
  • The readings came in groups of 400 points (i.e. every time the sensors read the data they run 400 points, one every 3 ms, then the reading stop for few seconds and then again an another reading).
  • every reading is identified by an ID so group the readings is pretty easy.

Problem

  • I would like to compute custom metrics form each reading (e.g. distance from two significant point in the reading, integrals of the readings, etc. );
  • on the customs readings I would like to compute statistical analysis with Timelion or Kibana.

Possible solution

  • get the readings from ES with a Java app, compute the new metrics and re-ingest the new data in ES;

Using external code to compute the metrics works for sure but doesn't seem to me really scalable and modern; is this the only solution?

Take into account that using custom fields is not an option because for computing the new metrics I need to work on all the 400 points not just a single doc (e.g. for computing a distance).
Note that these points are time series read from sensors.

Regards,
S.

The readings came in groups of 400 points

Are you storing those values as a single document in Elasticsearch, or as 400 documents? You'll probably want to do it as 400 documents to get access to the power of ES, and it sounds like you might already be indexing that way.

With individual documents, I think you can use Elasticsearch to calculate the things you're looking for, via aggregations, but it would be easier to say if that's true if I knew a little more about the data you have and some specifics about what you're trying to determine from it.

Thanks Joe for your reply, let me better explain the data I am dealing with.
First of all, the values are stored in 400 document, not in a single one. Each document has a time stamp and every doc is 3 ms spaced; so in few words I have 400 sample of a bunch of variables evenly spaced (they are actually values coming from the sampling of digital and analog signals via a Analog to Digital converter ).

Here is a doc with one sample:

{
  "_index": "colpi-2018.12.06",
  "_type": "doc",
  "_id": "iuxK12gBc0lL5pWknI1L",
  "_version": 1,
  "_score": null,
  "_source": {
    "encPos": 1.494141,
    "facility": "local1",
    "@version": "1",
    "maxForzaTot": 1097,
    "estensimetro_0": -2,
    "type": "Colpo",
    "maxForzaColonna_1": 263,
    "estensimetro_2": -1,
    "evFreno": false,
    "colpoID": 1544105001000,
    "appname": "csvconverter",
    "maxForzaColonna_3": 320,
    "estensimetro_3": -2,
    "@timestamp": "2018-12-06T14:03:22.197Z",
    "maxForzaColonna_2": 281,
    "evFrizione": false,
    "sysloghost": "piastrellina",
    "host": "127.0.0.1:40192",
    "colpiPerPezzo": 2,
    "maxForzaColonna_0": 233,
    "estensimetro_1": -3,
    "tags": [
      "colpo"
    ],
    "exception": "",
    "severity": "EMERG",
    "logger": "",
    "consumoMotore": 244,
    "colpoPezzo": 2
  },
  "fields": {
    "HourOfDay": [
      14
    ],
    "@timestamp": [
      "2018-12-06T14:03:22.197Z"
    ]
  },
  "sort": [
    1544105002197
  ]
}

Some notable field are:

  • "colpoID": 1544105001000 = unique identifier of the group of docs (i.e. for all the 400 samples the colpoID is the same);
  • "encPos": 1.494141 = a sample of the analog variable encPos;
  • "evFreno": false = a sample of the digital variable evFreno;
  • "@timestamp": "2018-12-06T14:03:22.197Z" = timestamp of this doc. For this group of docs (i.e. samples), for example, this timestamp could be the first one, the last one will be 2018-12-06T14:03:22.197Z + 400*3ms (i.e. each sample is evenly spaced of 3 ms)

Example of metrics:

  • brake_delay = distance in ms from the first sample and timestamp when evFreno move from false to true the first time;
  • velocity_change = distance in ms from the first sample and the time when the velocity of change (first derivative) of the value of the variable encPos is more than 10;

I hope that with this example the scenario is clearer than before.

Regards,
Stefano

brake_delay = distance in ms from the first sample and timestamp when evFreno move from false to true the first time;

Change detection isn't something Elasticsearch does, there's not a way to do any kind of calculation across documents in Elasticsearch. And the "first sample" requirement makes this kind of impossible, since you'd not just need to do calculation across documents, you'd also need to identify which two documents are important somehow.

velocity_change = distance in ms from the first sample and the time when the velocity of change (first derivative) of the value of the variable encPos is more than 10;

Same as above. You can query and visualize derivative values, but only across aggregations in some selected bucket of time, not on an individual document basis. So while you can kinda sorta see what you want, you can't do what you're asking for here with Elasticsearch.

As you noted previously, you can get pretty close to what you're looking for by enriching the data are ingest time, since you can perform a query for whatever condition you're looking for and calculate a value using the document you're about to index from that. Logstash should allow a workflow like that, and Ingest Pipeline in Elasticsearch probably does as well.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.