Search Performance Tuning

Hi,

I have one index in nested mapping structure with total of 35 million records . I am using some nested columns for searching also. When searching with nested fields, application is taking more time to respond for the first time. Second time its faster because of cache. Also I am doing some aggregations in the same search query itself. I need to speed up the search with nested columns. Can anybody have tips to improve the searching speed with nested columns.

Note : Some document have more than 10K nested objects. I am using PHP SDK for integrating the ES in my application. These nested search column have mapped with " index => true " option.

Please find server details :

ES Version : Elasticsearch 6.3.1
Server RAM : 256 GB
Harddisk : SSD 2 TB
Java Heap : 10 GB

Please find the sample mapping json for reference. This is not the original one, actual one having more fields.

{
  "index": "index_name",
  "body": {
    "settings": {
      "refresh_interval": "30s",
      "index.mapping.total_fields.limit": 1000000,
      "analysis": {
        "normalizer": {
          "case_insensitive": {
            "type": "custom",
            "filter": [
              "lowercase"
            ]
          }
        }
      }
    },
    "mappings": {
      "type_name": {
        "properties": {
          "field_1": {
            "type": "keyword",
            "index": "true"
          },
          "field_2": {
            "properties": {
              "data1": {
                "type": "keyword",
                "index": "true"
              },
              "data2": {
                "type": "keyword"
              }
            }
          },
          "filed_3": {
            "type": "nested",
            "properties": {
              "data_array_1": {
                "type": "nested",
                "properties": {
                  "data1": {
                    "type": "long"
                  },
                  "data2": {
                    "type": "keyword",
                    "index": "true"
                  },
                  "data3": {
                    "type": "keyword"
                  }
                }
              }
            }
          },
          "field_4": {
            "type": "nested",
            "properties": {
              "data_array_2": {
                "properties": {
                  "data1": {
                    "type": "keyword"
                  },
                  "data2": {
                    "type": "keyword"
                  }
                }
              }
            }
          },
          "field_5": {
            "type": "nested",
            "properties": {
              "metadata": {
                "properties": {
                  "data1": {
                    "type": "keyword"
                  },
                  "data2": {
                    "type": "keyword"
                  }
                }
              },
              "metadata_1": {
                "type": "nested",
                "properties": {
                  "metadata_2": {
                    "type": "nested",
                    "properties": {
                      "data1": {
                        "type": "keyword",
                        "index": "true"
                      }
                    }
                  },
                  "metadata_3": {
                    "type": "nested",
                    "properties": {
                      "data1": {
                        "type": "keyword",
                        "index": "true"
                      },
                      "data2": {
                        "type": "long"
                      }
                    }
                  },
                  "metadata_4": {
                    "type": "nested",
                    "properties": {
                      "data1": {
                        "type": "keyword",
                        "index": "true"
                      },
                      "data2": {
                        "type": "long"
                      },
                      "data3": {
                        "type": "keyword",
                        "index": "true"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

This caught my eye as a potential problem. Even if you need to store such a complex JSON document with nested fields, do you really need to index a million fields? That's a lot of cluster state to pass around, and a high ratio of Lucene documents to Elasticsearch docs.

But this ("index.mapping.total_fields.limit": 1000000,) will affect only when we are creating the index right? I have to use a large number here because I am dynamically preparing each documents, so some time 1 document contains 15K nested array values. If I am using a lesser value for total_field_limit that data will not go in the index.

Please let me know if I am worng.

If your mapping ends up being so big that you have to increase the default total fields limit by a factor of 1000, you are likely to run into problems beyond just search performance.

I suggest you set dynamic to false for as many fields as you can, and set enabled to false for as many object fields as you can.

Here's a small script I used at one customer site to help them periodically detect their mapping explosions:

for INDEX in `curl -s -XGET 'http://localhost:9200/_cat/indices' | cut -f 3 -d' '`; do echo `curl -s "localhost:9200/$INDEX/_mapping?pretty" | wc -l` $INDEX; done | sort -n

One index had a 4 million line mapping JSON, and it was still growing. Every time a document was indexed, a dynamic field got added and the mapping would change. But the mapping had grown so large that the pending cluster tasks queue would get backed up for half an hour. This is the road I imagine you are going down, where search performance will be the least of your worries.

Good luck!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.