Visualizing time series counting first time a term appears in index


(Jeff Rose) #1

I'm just spending my first week or so with Kibana and trying to build dashboards to visualize performance indicators that my team is interested in tracking. At the moment, what I'd like to make a line chart time series tracking the number of first-time deposits made by users each week.

I have an index called "event-deposit-finished" that tracks completed deposit events as they occur, but for the sake of my testing right now I'm just working with historical data.

If I were querying this in SQL, I'd probably approach this in roughly the following manner:

  1. Get a unique set of all users who have a completed deposit (event-deposit-finished index)
  2. For each unique user, find their first finished deposit
  3. Count the number of first-time deposits in each week
  4. Plot the resultant series of values

My event-deposit-finished index includes a date field as well as the user ID that made the deposit.

Any ideas?

Edit:
Moving on from here, I'm also interested in a chart that can track the time between a user registering and their first completed deposit, as we'd like to minimize this metric. I have another index "event-user-registered" that I can get the date and user ID of user registrations from. I gather this might be something I have to use timelion for, but I'm only just beginning to scratch the surface of what timelion can do.


(Brandon Kobel) #2

Hey @DigitalMachinist, i assume the complexity that you're running into is how to determine whether or not a deposit is the very first one for a user at query-time?


(Jeff Rose) #3

@Brandon_Kobel Yeah, that's basically right. I just can't seem to understand how to apply the aggregations that I'd need to do this via the visualization tools.


(Brandon Kobel) #4

@DigitalMachinist, this is one of those situations where Elasticsearch differs from traditional SQL. Elasticsearch has really limited join based capabilities, as discussed here and we can use features like pipeline aggregations to fill in some of the gaps as well.

The other option we have is calculating some of this on ingest. How are you currently ingesting your data into Elasticsearch, are you using the ingest node or perhaps logstash?


(Jeff Rose) #5

At the moment we have a fairly naive indexing strategy. We're running a Laravel application in which we index documents into Elastic Cloud using Elasticsearch-PHP when certain events are triggered. I don't believe we have an ingest node configured, and we're not using logstash as of yet (although maybe in the future).


(Brandon Kobel) #6

Gotcha, if you can augment your ingest pipeline to determine whether an event is the first for a specific user it'll make creating the various visualizations inside of Kibana really easy. Otherwise, we're stuck trying to use the pipeline aggregations to try to calculate these, and there will likely be limitations to how we're able to present this data.


(Jeff Rose) #7

Thanks. I'll spend a bit of time reading about the ingest process and see if I can come up with an appropriate way to handle this at that time. If I add a pipeline/processor to handle this data on ingestion, I assume I'll have to reindex the appropriate data so it can be ingested/processed properly?

Is there a particular type of processor that I should look into for this kind of task?


(Brandon Kobel) #8

You will have to reindex your data, using Logstash and the elasticsearch filter should make this not too painful.