Hi!

I need to get all distinct terms for a list of fields on an cluster of

six nodes with 3TB of data in over 1000 indices. I use following query

to get all terms for certain fields:

{

"query" : {

"match_all" : {

}

},

"facets" : {

"facility" : {

"terms" : {

"field" : "facility",

"size" : 300,

"order" : "term"

}

},

"severity" : {

"terms" : {

"field" : "severity",

"size" : 300,

"order" : "term"

}

},

"hostname" : {

"terms" : {

"field" : "hostname",

"size" : 300,

"order" : "term"

}

},

"timestamp" : {

"terms" : {

"field" : "timestamp",

"size" : 300,

"order" : "term"

}

},

"mainCategory" : {

"terms" : {

"field" : "mainCategory",

"size" : 300,

"order" : "term"

}

},

"subCategory" : {

"terms" : {

"field" : "subCategory",

"size" : 300,

"order" : "term"

}

},

"country" : {

"terms" : {

"field" : "country",

"size" : 300,

"order" : "term"

}

}

}

}

We get OutOfMemoryExceptions (4GB heap size) when we issue this query.

How should I change the query to avoid the problem? ElasticSearch

version is 0.17.8. I know that Lucene has a feature to get all

distinct terms for a field without reading the whole index. How can I

do that with Elasticsearch?

For now we have a fixed list for the values - but that is not an

"elastic" solution...

CU

Thomas