Elastic Cloud Service search performance

Hello,

we currently evaluating Elasticsearch as replacement of our Sphinxsearch (sphinxsearch.com).
As step one we tried the Elasticsearch service from AWS, but it really sucks.
Now we are currently trying the Elastic Cloud service, but we are also not entirely happy here at the moment.

We are evaluating the service with AWS.DATA.HIGHIO.I3 2G RAM
The indexing performance is great but the search performance is terrible.

Here some more information, maybe somebody can give us tips how to optimize the search performance.

We have an index with round about 14,000,000 documents with 50 fields.
The current disk usage is 13,88 % of 60 GB
JVM memory pressure is 12 %

average took of search over 8 seconds
increasing RAM to 8G with 2 zones just improves the time about 2 seconds

we use a somewhat complex query for the search, but under sphinx this takes less than 1.5 seconds and I actually thought that elastic was the better product :laughing:

Since Elastic.co does not provide any real support for the evaluation, I now turn to the community.

best regards
Ricardo

Could you share more details about this? Like a typical document, the mapping and a typical query?

Another thing that is useful is information about shard count, size and Elasticsearch version.

@dadoonet @Christian_Dahlqvist

thanks in advance for possible support

Shards: 10

Version: 7.7.1

@Christian_Dahlqvist size of?

typical document:

{
  "_index": "media",
  "_type": "_doc",
  "_id": "85001",
  "_version": 1,
  "_score": 0,
  "_source": {
    "state": "accepted,active",
    "id_media": 85001,
    "date_upload": "2005/06/13 22:00:00",
    "mimetype": "image/jpeg",
    "width": 1284,
    "height": 974,
    "ranking": 4,
    "opt_contract": "rights,merchandising,newwebprice,subscription,rightsmanaged",
    "flags": "photo",
    "published": "SA,MC",
    "id_user": 3423,
    "has_gps": 0,
    "gps_latitude": null,
    "gps_longitude": null,
    "date_deleted": "1970/01/01 00:00:00",
    "title": "Pa Zwi To",
    "description": null,
    "keywords_main": "",
    "keywords": "paprika,onion,tomato,salad,food,vegetables,vitamins,healthy,raw food",
    "keywords_prio": "food,vegetable,onion",
    "keywords_category": "food and drink,food,fruits and vegetables,vegetable,types,tomato",
    "keywords_bought": "",
    "not_adult": 1,
    "username": "m.kempf",
    "sales_total": 0,
    "sales_total_last_12M": 0,
    "id_art_collection": 2,
    "ethno": "",
    "additional_weight": 500
  },
  "fields": {
    "date_upload": [
      "2005-06-13T22:00:00.000Z"
    ],
    "date_deleted": [
      "1970-01-01T00:00:00.000Z"
    ]
  }
}

mapping:

{
    "media" : {
      "mappings" : {
        "properties" : {
          "additional_weight" : {
            "type" : "long"
          },
          "date_deleted" : {
            "type" : "date",
            "format" : "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis"
          },
          "date_upload" : {
            "type" : "date",
            "format" : "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis"
          },
          "description" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "ethno" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "flags" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "gps_latitude" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "gps_longitude" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "has_gps" : {
            "type" : "long"
          },
          "height" : {
            "type" : "long"
          },
          "id_art_collection" : {
            "type" : "long"
          },
          "id_media" : {
            "type" : "long"
          },
          "id_user" : {
            "type" : "long"
          },
          "keywords" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "keywords_bought" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "keywords_category" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "keywords_main" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "keywords_prio" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "mimetype" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "not_adult" : {
            "type" : "long"
          },
          "opt_contract" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "published" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "ranking" : {
            "type" : "long"
          },
          "sales_total" : {
            "type" : "long"
          },
          "sales_total_last_12M" : {
            "type" : "long"
          },
          "state" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "title" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "username" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "width" : {
            "type" : "long"
          }
        }
      }
    }
  }

query:

GET /media/_search
{
    "explain": false,
    "track_total_hits": true,
    "profile": false,
    "query": {
        
      
        "function_score": {
            "query": {
                "bool": {
                    "must": [],
                    "must_not": [
                        {
                            "terms": {
                                "opt_contract": [
                                    "editorial"
                                ]
                            }
                        },
                        {
                            "terms": {
                                "state": [
                                    "blocked",
                                    "suspend",
                                    "deleted",
                                    "deleted_highres",
                                    "deleted_admin"
                                ]
                            }
                        }
                    ],
                    "should": [
                        {
                            "multi_match": {
                                "query": "hot girl",
                                "type": "phrase",
                                "fields": [
                                    "id_media.keyword^30",
                                    "keywords_prio^1",
                                    "title^0.5",
                                    "description^0.5",
                                    "keywords_category^0.3",
                                    "keywords_main^0.3",
                                    "keywords^0.1",
                                    "keywords_bought^0.5"
                                ],
                                "minimum_should_match": "2<-25%",
                                "prefix_length": 3,
                                "auto_generate_synonyms_phrase_query":true
                            }
                        },
                        {
                            "multi_match": {
                                "query": "hot girl",
                                "operator": "and",
                                "fields": [
                                    "id_media.keyword^30",
                                    "keywords_prio^1",
                                    "title^0.5",
                                    "description^0.5",
                                    "keywords_category^0.3",
                                    "keywords_main^0.3",
                                    "keywords^0.1",
                                    "keywords_bought^0.5"
                                ],
                                "fuzziness": "AUTO",
                                "prefix_length": 3
                            }
                        },
                        {
                            "range": {
                                "ranking": {
                                    "boost": 6,
                                    "gte": 9
                                }
                            }
                        },
                        {
                            "range": {
                                "ranking": {
                                    "boost": 4,
                                    "gte": 7,
                                    "lt": 9
                                }
                            }
                        },
                        {
                            "range": {
                                "ranking": {
                                    "boost": 2,
                                    "gte": 5,
                                    "lt": 7
                                }
                            }
                        }
                    ],
                    "filter": [
                        {
                            "terms": {
                                "id_art_collection": [
                                    2,
                                    4,
                                    6
                                ]
                            }
                        },
                        {
                            "range": {
                                "ranking": {
                                    "gte": 5
                                }
                            }
                        },
                        {
                            "match": {
                                "published": "SA"
                            }
                        },
                        {
                            "terms": {
                                "state": [
                                    "active",
                                    "accepted"
                                ]
                            }
                        },
                        {
                            "match": {
                                "date_deleted": "1970\/01\/01 00:00:00"
                            }
                        },
                        {
                            "terms": {
                                "opt_contract": [
                                    "rights"
                                ]
                            }
                        },
                        {
                            "terms": {
                                "flags": [
                                    "photo"
                                ]
                            }
                        }
                    ]
                }
            },
            "functions": [
                {
                    "gauss": {
                        "date_upload": {
                            "origin": "now+20d",
                            "scale": "30d",
                            "offset": "1d",
                            "decay": 0.5
                        }
                    },
                    "weight": 6
                },
                {
                    "field_value_factor": {
                        "field": "additional_weight",
                        "factor": 0.01,
                        "modifier": "sqrt",
                        "missing": 0
                    }
                },
                {
                    "field_value_factor": {
                        "field": "not_adult",
                        "factor": 5,
                        "modifier": "sqrt",
                        "missing": 0
                    }
                },
                {
                    "field_value_factor": {
                        "field": "sales_total",
                        "factor": 0.1,
                        "modifier": "sqrt",
                        "missing": 0
                    }
                },
                {
                    "field_value_factor": {
                        "field": "sales_total_last_12M",
                        "factor": 1,
                        "modifier": "sqrt",
                        "missing": 0
                    }
                },
                {
                    "field_value_factor": {
                        "field": "has_gps",
                        "factor": 2.5,
                        "modifier": "sqrt",
                        "missing": 0
                    }
                },
                {
                    "weight": 2,
                    "random_score": {
                        "seed": 1234,
                        "field": "id_user"
                    }
                },
                {
                    "weight": 2,
                    "random_score": {
                        "seed": 1234,
                        "field": "id_media"
                    }
                }
            ],
            "score_mode": "sum",
            "boost_mode": "sum"
        }
    },
    "from": 0,
    "size": 60,
    "_source": [
    "id_media",
    "width",
    "height",
    "id_user",
    "keywords_prio",
    "title",
    "keywords",
    "description",
    "opt_contract",
    "flags",
    "id_art_collection",
    "has_gps",
    "ranking",
    "date_upload",
    "sales_total_last_12M",
    "sales_total",
    "additional_weight",
    "keywords_category",
    "keywords_main",
    "keywords_bought"
  ]
}

Having 10 shards for less than 10GB of indexed data seem excessive. I would think 1 or 2 would be more appropriate.

How many documents typically match the filter clauses and need to be evaluated in full with respect to the other clauses and scoring? If this is a large portion it could be that you are CPU limited. You typically get CPU allocated proportional to the size of the instance and smaller instances are allowed to burst.

If CPU is the issue you might be better off with a compute optimised node.

The query looks to be doing a lot of terms queries against text fields (and numerics) that perhaps should be keyword fields instead. Also IIUC track_total_hits: true is expensive since it prevents the query from terminating much earlier.

1 Like

@Christian_Dahlqvist CPU is looking good during search, if I can trust Elastic Cloud metrics.
Is it possible to change the number of Shard in "Elastic Cloud - Elasticsearch Service"?
Then I will give them a try.

I would also like to try to change from I/O optimized to a compute optimised, but I have no option to change this on Elastic Cloud. I can select only I/O or Hot-Warm ...

@DavidTurner "track_total_hits": false make no diff for the took in my case, but can you give me a sample what you mean with terms and keyword fields. Is it enough to use as sample .keyword on all my terms

Your text fields all look to have a .keyword variant too so that'll work there, but not on long IDs such as id_art_collection that should probably be mapped as keywords. However if you're never going to do full-text search on these text fields then you may as well drop the text type and only map them as keyword ones. Fewer fields should mean a smaller index and therefore more of it will fit in filesystem cache.

Not sure that this'll help of course, just something I noticed on my way past.

Reducing shard count might make a difference, although more shards means more parallelism so it depends exactly what you're trying to optimise.

Also how exactly are you doing the benchmarking here? Are you doing repeated queries and allowing for a warmup phase, or is this a one-shot thing? There are lots of caches that have a big effect on performance if cold, but that's not a realistic production test. There's a good talk about performance gotchas and we recommend Rally for reliable benchmarking.

@DavidTurner yes, I tested it with repeated queries, but first call and next diff only round about 1 second (with more than 8 seconds that doesn't matter).

How can I use as sample for filter terms the keyword?

if I use it like:

                        {
                            "terms": {
                                "state.keyword": [
                                    "active",
                                    "accepted"
                                ]
                            }
                        }

I get no result

@DavidTurner @Christian_Dahlqvist @dadoonet any ideas or suggestions?

@DavidTurner what about my question about terms and keyword above?

Thanks in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.