Very poor search and aggregation performance

Hey there,
we are experiencing really bad performance of our search and agreggation queries and would appreciate any kind of help and advice how to improve our ElasticSearch setup.

System:

  • 26GB RAM (from 70GB on the server)
  • 2 CPUs
  • on a Windows VM inside a storage system

ElasticSearch:

  • 1 node
  • 36.000.000 documents with 350 GB storage size
  • 5 shards

Mapping:

  • one type with 700 fields (unfortunatly quiete sparse), mainly keyword and text fields

Problem:

  • A search like match all with no filter is fast.
  • A search with match all and a native script plugin used as a filter is very slow (> 1 minute)
  • Term aggregations on either integer, keyword or text fields (with fielddata) is very slow (> 2 minutes), if combined with the native script plugin its even slower.

We noticed that CPU usage increases to 100% whilst performing this queries and blocks any other query execution.

Also one query is send after the other, so its not like we are sending many queries at the same time.

We are thankful for any tipps and let us know if you need further information.

Could you explain what your native script is doing (or even better share the actually code)? It seems to me that is the most likely cause of the slowness since just adding the script filter makes the query very slow.

Hey,
the script is used as an acl filter. It first gets the content of a field (an array of ints) and iterates over this list and compares it with a submitted list of users ACLs. (Our system provides "AND" groups and we did not find any other plugin that supports this.)

We use this to get the field content:
FieldLookup fieldLook = (FieldLookup) fields().get("fieldname")
The mapping of the field:
"fieldname": { "type": "long", "store": true, "ignore_malformed": true, "include_in_all": false }
We can not use docValues as we need to process the fields in the same order as they were indexed.

CPU usage is also at 100% if aggregations are build (also without the plugin).

Is our system infrastructure sufficient or do you think using more CPUs and/ or more nodes would increase query performance significantly?

Thanks

Ok so you are saying that you are seeing slow requests when you don't use the native script at all? If so could you provide the request and response for this case (when the response is slow and you don't use your native script)?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.