Hi there, I am using elasticsearch (which is totally brilliant by the way)
for an index with approximately 21 million records in it, and within those
records I have one particular field that has between 1 and perhaps 10
values, and those values are often unique to just that record. The values
are text strings - names of people. I am using a dynamic mapping.
I would like to be able to facet on this field, but whatever I do, I just
crash my index. So I am looking for further suggestions.
I have stored this field unanalysed, and I have tried the field cache field
type set to soft and not set at all, and tried field cache max size to
various values ranging from 1 to 10,000,000.
I have run this on a single machine with 60gb memory reserved to
elasticsearch. It eventually fails with an Out of Memory error and tries to
dump the heap.
I have also tried running it on a cluster of 8 machines with 6gb for
elasticsearch on each, trying with between 1 and 16 shards, and between 1
and 8 replicas. Also on a cluster of 4 machines with 12gb each. However it
again fails with OOM, a bit sooner than the one big machine.
Are other people running facets on fields with this many potentially unique
values - on the order of 70,000,000? Am I just pushing elasticsearch too
far, or is it worth trying with more machines / one even bigger machine /
many even bigger machines?
Any feedback from people doing this sort of scale of faceting would be
appreciated, or any other settings suggestions you can provide would be
great, so that I can get an idea if it is worth trying any further or just
give up faceting on this field.
Thanks!
--