Elastic Search performance issues when searching on docs with large field data


(Pranav) #1

Hello everyone,

I'm using elastic search in one of my project.
According to my implementation, I have around 200 fields in my single document. All field contains different types of data which is maximum of 1 line, but except one field. That single field can contain data of size 6000-10000 words or say 30 pages.

So my issue is that, One large field can decrease the search performance on other small fields or not??

If I store them in single index, it will degrade performance from storing the large single field separately from other small fields.


(Robert) #2

I would say it depends on you search query. If you search for a value which is inside the field with 30 pages, then it will be slower. If the value of the field is not important and you search for values in other fields then I think it will not degrade you performance (for query time).

Best regards


(Pranav) #3

@elastic can you please answer my query?


(Christian Dahlqvist) #4

Whether the large field has an impact on query performance or not depends on your queries and potentially also your mappings and which version of Elasticsearch you are using. As you have not provided any details around this it is impossible for us to tell.


(Pranav) #5

Thanks @Christian_Dahlqvist for replying.
I am using Elasticsearch version 6.3.2, I'm using wide range of queries (including aggregation, sorting ,highlighting, fuzziness etc)
This is a kind of mapping i am using,

{
  "mappings": {
    "data": {
      "properties": {
        "content": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "userId": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

This is the example of one of the query.

{
  "from": 0,
  "size": 30,
  "query": {
"bool": {
  "must": {
    "multi_match": {
      "query": "elasticsearch",
      "fields": [ "field1","field2"],
      "fuzziness": "AUTO",
      "minimum_should_match": "80%"
    }
  },
  "should": {
    "multi_match": {
      "query": "elasticsearch",
      "fields": ["field1","field2"],
      "type": "phrase",
      "slop": 1
    }
  }
}
  },
  "aggregations": {
"agg_example": {
  "terms": {
    "field": "type.keyword"
  }
}
  },
  "highlight": {
"type": "unified",
"fields": {
  "*": {}
}
  }
}

If their is one more field with name "field3"(size 30 pages) on which I am not searching but is present in the indexed document. So, it will affect my searching performance on field1 and field2.


(Christian Dahlqvist) #6

I believe the performance of highlighting can be affected by the document size, so the large field could impact this. Most of the other types operate on the indexed data so might not necessarily be affected to the same extent. As always I would recommend you benchmark to find out for sure.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.