Pagination on unique data

Hey Guys,
I am building some Logging and monitoring product for my employer and using
ES as backend.
now finding unique value of each/any attribute is core part of business
logic I have in hand.

lets say I want unique dst_ip, to achieve that,

  • I have used "index":"not_analyzed" for selected fields

  • Api used to get unique count
    http://127.0.0.1:9200/es-server/Events/_search -d
    '{"aggs":{"dst_ip_count":{"cardinality":{"field":"dst_ip"}}},"size":0}'

  • Api used to fetch those values
    http://127.0.0.1:9200/es-server/Events/_search -d
    '{"fields":["dst_ip"],"facets":{"terms":{"terms":{"field":"dst_ip","size":1116,"order":"count"}}},"size":1116}'

    here 1116 is received from first API. now here the count is very small
    but in production environment this count goes greater then 2lakh. which
    results in slow query response.

do we have any other way to fetch such values with pagination inbuild like
we have in search query with size and from.

Please suggest, thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b11eaa9f-ba52-4e0a-ba21-3cfb6e669a58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

There is no support for pagination for terms aggregations.
The official reason seems to be that it is "tricky to implement"; see issue
#4915 https://github.com/elasticsearch/elasticsearch/issues/4915 which is
now unfortunately closed.

So getting paginated terms ordered by count does not seem possible at this
point.
You could, however, order them alphabetically (by term), and apply filtering
http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-bucket-terms-aggregation.html#_filtering_values in
a clever way to retrieve sequences of terms.
As you point out, a cardinality
http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-metrics-cardinality-aggregation.html query
beforehand could inform your paging strategy.

Algorithm assuming A-Z letters for a well distributed collection of terms:

  • determine cardinality based on the first character (26 buckets)
  • if the size of a bucket exceeds a certain limit, repeat with the second
    character for that bucket (26 sub buckets)
  • the prefix of the term (1 or more letters) then becomes your paging
    mechanism

How this translates in performance, I have no idea.
It will save on transfers from ES for sure, but it might not perform as
well as simply fetching every term and doing the paging in the server
application layer.

Personally, I would love to see pagination support in Elasticsearch, even
if there is a performance penalty.
It seems much better than risking flooding a naive client or server with
too many terms at once.

On Thursday, September 11, 2014 2:48:30 PM UTC-4, jigish thakar wrote:

Hey Guys,
I am building some Logging and monitoring product for my employer and
using ES as backend.
now finding unique value of each/any attribute is core part of business
logic I have in hand.

lets say I want unique dst_ip, to achieve that,

  • I have used "index":"not_analyzed" for selected fields

  • Api used to get unique count
    http://127.0.0.1:9200/es-server/Events/_search -d
    '{"aggs":{"dst_ip_count":{"cardinality":{"field":"dst_ip"}}},"size":0}'

  • Api used to fetch those values
    http://127.0.0.1:9200/es-server/Events/_search -d
    '{"fields":["dst_ip"],"facets":{"terms":{"terms":{"field":"dst_ip","size":1116,"order":"count"}}},"size":1116}'

    here 1116 is received from first API. now here the count is very small
    but in production environment this count goes greater then 2lakh. which
    results in slow query response.

do we have any other way to fetch such values with pagination inbuild like
we have in search query with size and from.

Please suggest, thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f4e93af8-27b1-45a8-b650-ee2311c83066%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.