Visualising survival data as a survival plot


#1

Hi,
I wonder if anyone can help me to draw a survival plot in Kibana. Survival plots are a well known way of displaying the survival of a population over time (wikipediaKm_plot ) and I have the data to do it but can't work it out in Elastic/Kibana.

My index is made up of many person documents each with a person ID and survival number of days integer (plus other attributes we don't care about right now).

{"person_id": "aaa111",
"survival_days": 5 }
{"person_id": "aaa112",
"survival_days": 42 }
{"person_id": "baa101",
"survival_days": 1 }
{"person_id": "daa811",
"survival_days": 682 }...

A histogram can be drawn from this data just by pointing the line chart at the data., i.e. count of people vs survival days. The population gradually dies off at a steady rate. A survival plot is a different view made by plotting the sum of remaining people at every day such that on day 1 everyone is alive so it reads 100%, day two some have died off and the remaining sum is plotted as a percentage. This is repeated every day that someone dies off resulting in a gradually descending curve where the percentage total still alive decreases.

One way of thinking about it is from the simple histogram of spread of people vs survival days, every bar is replaced by a point which is the sum of all the bars (population) to the right of it.

As this seems to be some kind of aggregation, or set of aggregations, I would think elastic should be able to do it. Can anyone work it out?


(Peter Pisljar) #2

the aggregations you are looking for:

  • comulative sum: will count how many ppl died in total so far (kind of inverse of what you are looking for)
  • count: to count all the ppl
  • some math over the two (count - comulative sum of count per survival day)

our vertical bar chart currently doesn't support math

in visual builder and timelion you are limited to date histogram on your x axis.
to use them you would need to reindex your data, so instead of logging the survival_days you index a date on which a person died.
you could also create a scripted field to do this calculation on the fly for you, but that won't be as performant as you might wish.

then you could use visual builder to do something like this:


#3

Thanks Peter, that's interesting and would be a good solution if it wasn't that the survival plot is a specific well known type of plot that is needed here. One of it's advantages is that it takes dates completely out of the picture so that you can combine people's data from different timeframes. So the date histogram is not a viable solution.
I too wondered about a scripted field but don't have the understanding to tackle this.


(Peter Pisljar) #4

take a look at the following blog post: https://www.elastic.co/blog/using-painless-kibana-scripted-fields

you are probably looking into something like:

  • starts counting one year ago

return LocalDateTime.now().minusDays(365).plusDays(doc['survival_days'].value)

then you can use this scripted field in visual builder or timelion to create your agg.


#6

Well, I decided to go with the normalised date approach which needs a little explanation for the users but seemed worth it. Essentially Peter's solution works and provides a good plot (I went with re-indexing rather than scripted field). I was also able to put a couple more plots on to enrich the viz, so all in all pretty happy. Thanks!


#7

Hi, it turns out my nice looking graph is not done yet. What I've got uses the Count, Cumulative Sum & Overall Sum of count but this is all on the docs where status=deceased. However, my Overall Sum of count needs to be for status=deceased + status=alive. I can't see how to achieve this either in visual builder or timelion for different reasons.
So my current working calc is: 1- (cumulative sum of deceased/ (overall total of deceased) )
My desired target calc is :- 1- (cumulative sum of deceased/ (overall total of deceased + total alive) )

If I use visual builder I've separately tried Filter Ratio & Math but can't count both deceased & alive while getting overall sum of both. Is there a way to use the count from one time series in another?
In timelion, I can count each separately and divide/subtract but I don't see how to get the overall total for both.