Memory issues with facets

slushi · June 4, 2012, 4:55pm

I am trying to use facets to implement autocomplete behavior (try to
autocomplete the search term as the user types) but I have been having
memory issues. I have an single node index with 2 shards, each with about
450K documents and 150K terms in the field being used for autocomplete.
After starting the server, I tried doing my facet search which is something
like this:

{ "size" : 0,
"query" : {
"prefix" : { "f" : "foo" }
},
"facets" : {
"autocomplete" : {
"terms" : {
"field" : "f",
"regex" : "foo.*",
"regex_flags" : "DOTALL",
"size" : 5
}
}
}
}

the bigdesk cache graph showed the field cache jumping up to 800,000,000.
This seems high, these are basically tags so they should be short strings.
I saw this threadhttps://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Mdh8hN04c2c which
is probably relevant as the field is indeed multivalued. I will try to
switch to nested documents and see if there is any improvement. A few
additional questions:

The docs say that by default field cache entries never expire. I
assume the cache is still kept coherent with the index, so if docs are
updated/deleted/expired the relevant cache entries are evicted?
Does the number of shards/node increase field cache memory usage?
I.e., would memory usage go down if i used one shard rather than 2?
I have a specific field just to support autocomplete, would it be
better to simply split out the field into its own index rather than use
nested docs?
Does this approach (facets for autocomplete) make sense given the
size of the field? I was using solr before which had a way (terms
component http://wiki.apache.org/solr/TermsComponent) to directly
access the index terms, this seems much more efficient than what I am doing
here. Would it make more sense to build a separate index for autocomplete
by periodically mining the top terms from the main index and then doing
"normal" prefix queries to get a suggestion?

Thanks!

kimchy · June 8, 2012, 10:03pm

Using facets for autocomplete usually does not make sense, search the
mailing list on how to do it with edge ngrams.

On Mon, Jun 4, 2012 at 6:55 PM, slushi kireetreddy@gmail.com wrote:

I am trying to use facets to implement autocomplete behavior (try to
autocomplete the search term as the user types) but I have been having
memory issues. I have an single node index with 2 shards, each with about
450K documents and 150K terms in the field being used for autocomplete.
After starting the server, I tried doing my facet search which is something
like this:

{ "size" : 0,
"query" : {
"prefix" : { "f" : "foo" }
},
"facets" : {
"autocomplete" : {
"terms" : {
"field" : "f",
"regex" : "foo.*",
"regex_flags" : "DOTALL",
"size" : 5
}
}
}
}

the bigdesk cache graph showed the field cache jumping up to 800,000,000.
This seems high, these are basically tags so they should be short strings.
I saw this threadhttps://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Mdh8hN04c2c which
is probably relevant as the field is indeed multivalued. I will try to
switch to nested documents and see if there is any improvement. A few
additional questions:

The docs say that by default field cache entries never expire. I
assume the cache is still kept coherent with the index, so if docs are
updated/deleted/expired the relevant cache entries are evicted?

Does the number of shards/node increase field cache memory usage?
I.e., would memory usage go down if i used one shard rather than 2?

I have a specific field just to support autocomplete, would it be
better to simply split out the field into its own index rather than use
nested docs?

Does this approach (facets for autocomplete) make sense given the
size of the field? I was using solr before which had a way (terms
component http://wiki.apache.org/solr/TermsComponent) to directly
access the index terms, this seems much more efficient than what I am doing
here. Would it make more sense to build a separate index for autocomplete
by periodically mining the top terms from the main index and then doing
"normal" prefix queries to get a suggestion?

Thanks!

slushi · August 14, 2012, 5:02pm

I was planning to try ngrams as well. One issue I am unsure about is how to
boost popular terms in the ngram index. The nice thing about the solr
TermsComponent or facets is I can suggest common terms as the user types
instead of the "closest" term. If I build an autocompletion index while I
am building my main index, it's not clear to me how to easily boost terms
according to index popularity.

On Friday, June 8, 2012 6:03:38 PM UTC-4, kimchy wrote:

Using facets for autocomplete usually does not make sense, search the
mailing list on how to do it with edge ngrams.

On Mon, Jun 4, 2012 at 6:55 PM, slushi <kiree...@gmail.com <javascript:>>wrote:

I am trying to use facets to implement autocomplete behavior (try to
autocomplete the search term as the user types) but I have been having
memory issues. I have an single node index with 2 shards, each with about
450K documents and 150K terms in the field being used for autocomplete.
After starting the server, I tried doing my facet search which is something
like this:

{ "size" : 0,
"query" : {
"prefix" : { "f" : "foo" }
},
"facets" : {
"autocomplete" : {
"terms" : {
"field" : "f",
"regex" : "foo.*",
"regex_flags" : "DOTALL",
"size" : 5
}
}
}
}

the bigdesk cache graph showed the field cache jumping up to 800,000,000.
This seems high, these are basically tags so they should be short strings.
I saw this threadhttps://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Mdh8hN04c2c which
is probably relevant as the field is indeed multivalued. I will try to
switch to nested documents and see if there is any improvement. A few
additional questions:

The docs say that by default field cache entries never expire. I
assume the cache is still kept coherent with the index, so if docs are
updated/deleted/expired the relevant cache entries are evicted?

Does the number of shards/node increase field cache memory usage?
I.e., would memory usage go down if i used one shard rather than 2?

I have a specific field just to support autocomplete, would it be
better to simply split out the field into its own index rather than use
nested docs?

Does this approach (facets for autocomplete) make sense given the
size of the field? I was using solr before which had a way (terms
component http://wiki.apache.org/solr/TermsComponent) to directly
access the index terms, this seems much more efficient than what I am doing
here. Would it make more sense to build a separate index for autocomplete
by periodically mining the top terms from the main index and then doing
"normal" prefix queries to get a suggestion?

Thanks!

--

Ludovic_Fleury · August 17, 2012, 2:00pm

Hey, I'm new to ES but I've got the same need:
I've got a list of place with a city and I wanted to provide an
autocomplete feature on the facets list.
I can't see how to do that without facets (because it's an aggregation of
unique value for a field in my index type). Any hint ?
Thanks

Le samedi 9 juin 2012 00:03:38 UTC+2, kimchy a écrit :

Using facets for autocomplete usually does not make sense, search the
mailing list on how to do it with edge ngrams.

On Mon, Jun 4, 2012 at 6:55 PM, slushi <kiree...@gmail.com <javascript:>>wrote:

I am trying to use facets to implement autocomplete behavior (try to
autocomplete the search term as the user types) but I have been having
memory issues. I have an single node index with 2 shards, each with about
450K documents and 150K terms in the field being used for autocomplete.
After starting the server, I tried doing my facet search which is something
like this:

{ "size" : 0,
"query" : {
"prefix" : { "f" : "foo" }
},
"facets" : {
"autocomplete" : {
"terms" : {
"field" : "f",
"regex" : "foo.*",
"regex_flags" : "DOTALL",
"size" : 5
}
}
}
}

the bigdesk cache graph showed the field cache jumping up to 800,000,000.
This seems high, these are basically tags so they should be short strings.
I saw this threadhttps://groups.google.com/forum/?fromgroups#!topic/elasticsearch/Mdh8hN04c2c which
is probably relevant as the field is indeed multivalued. I will try to
switch to nested documents and see if there is any improvement. A few
additional questions:

The docs say that by default field cache entries never expire. I
assume the cache is still kept coherent with the index, so if docs are
updated/deleted/expired the relevant cache entries are evicted?

Does the number of shards/node increase field cache memory usage?
I.e., would memory usage go down if i used one shard rather than 2?

I have a specific field just to support autocomplete, would it be
better to simply split out the field into its own index rather than use
nested docs?

Does this approach (facets for autocomplete) make sense given the
size of the field? I was using solr before which had a way (terms
component http://wiki.apache.org/solr/TermsComponent) to directly
access the index terms, this seems much more efficient than what I am doing
here. Would it make more sense to build a separate index for autocomplete
by periodically mining the top terms from the main index and then doing
"normal" prefix queries to get a suggestion?

Thanks!

--

Topic		Replies	Views
Faceting high cardinality fields - idea review Elasticsearch	9	630	July 6, 2017
Terms facet explodes memory Elasticsearch	16	395	July 6, 2017
Estimating field cache size for facets in advance Elasticsearch	11	502	July 6, 2017
How does the memory usage for terms facets work? Elasticsearch	7	444	July 6, 2017
Facets / OurOfMemorryError / Required RAM Elasticsearch	5	379	July 6, 2017

Memory issues with facets

Related topics