Visualize the latest status of news articles using Kibana

I'm trying to visualize the latest status of a news articles using Kibana.

Here's a brief example of what I'm trying to do:

I have a database of news. Each piece of news contains a headline, a timestamp and a status of whether the article has been printed.

I want the get the last (timestamp based) headline status for each available unique headline and visualize it in Kibana (possibly a pie chart).

    #!/bin/bash

    export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

    # Create indexes

    curl -XPUT "$ELASTICSEARCH_ENDPOINT/news" -d '{
        "mappings": {
            "news": {
                "properties": {
                    "headline": { "type": "object" },
                    "timestamp": { "type": "date" },
                    "status": { "type": "string" }
                }
            }
        }
    }'

    # Index documents
    curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
    {"index":{"_index":"news","_type":"news"}}
    {"status": "Pending", "headline": "Great news", "timestamp": "2015-07-28T00:07:29.000"}
    {"index":{"_index":"news","_type":"news"}}
    {"status": "Pending", "headline": "Great news", "timestamp": "2015-07-28T00:08:23.000"}
    {"index":{"_index":"news","_type":"news"}}
    {"status": "Pending", "headline": "Sports news", "timestamp": "2015-07-28T00:09:32.000"}
    {"index":{"_index":"news","_type":"news"}}
    {"status": "Printing", "headline": "Sports news", "timestamp": "2015-07-28T00:10:35.000"}
    {"index":{"_index":"news","_type":"news"}}
    {"status": "Printing", "headline": "Crazy news", "timestamp": "2015-07-28T00:11:54.000"}
    {"index":{"_index":"news","_type":"news"}}
    {"status": "Printed", "headline": "Crazy news", "timestamp": "2015-07-28T00:12:31.000"}

More specifically, I would like to know the count of latest Pending, Printing and Printed statuses for every unique headline article without printing anything else, preferably a simple pie chart showing the counts for the three statuses. For instance, in the given example the stats would be:

  • Pending = 1 (since "Great news" has latest pending status)
  • Printing = 1 (since "Sports news" has latest printing status)
  • Printed = 1 (since "Crazy news" has latest printed status)

I tried writing a query for it as well in elastic search, but could only get the latest headlines using terms and top_hits aggregations. Also, if another terms aggregation on status was applied first then it would give the unique headline within each status which was resulting in duplicate results.

So, how could I get the count of latest Pending, Printing and Printed statuses for every unique headline article without printing anything else? Any help would be appreciated!!

Unfortunately I don't think this is currently possible. We do have a "last" hit metric to grab the latest value per unique term, but it only works on numeric fields, and you wouldn't be able to aggregate on the results.

See here is the most recent "bytes" value, per ip using the "top hit" metric:

But I don't think that is what you want.

@ppisljar can you think of anything? or @simianhacker, would something like this be possible with the TSVB?

This may be a good use case for an entity-centric index, where you have a separate index that has one document per unique headline containing necessary metadata and the latest status. This will make it easy to display in Kibana.

i am not exactly sure what is the goal ....

but to get the last record for every unique headline you could:

  • term split on headline (you get one bucket for every unique pipeline)

  • top hits on timestamp field (you get the last N entries for each bucket ... if i understand correctly you want your N to be 1, and you want to use status field

now i think you want to count all the different statuses ? this is where visualize won't be able to go futher.

timelion could probably handle that and maybe even TSVB.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.