Proper way to store large fields in elasticsearch

Hi,

Our documents have few fields which cross the limit of length of the field allowed for highlighting. These are only some 6 fields which are very large. I don't want to increase that limit. And the error suggests storing large fields with offset to be able to highlight those fields. I tried to index large fields with offsets and it solved the problem.

But indexing large fields with offset introduced other problem where index size significantly increased. (From 10GB to 15GB sometimes). So is there a way to store large fields and be able to search on them without having to index with offsets?

'store=yes' seems like a good thing, but the caveat is while reindexing the field with 'store=yes' is not reindexed and we will have to do that manually.

So what is the best possible way to solve this? (Does adding a field like an attachment and then be able to search on it is an option here? If so, does reindexing works well with that?)

Thanks,

Hi! if those field are so large you can start with building a costum template where you speficy that those fields have only keyword data type or Lucene will store them twice as text for full text queries and as keyword creating two inverted indexes.

Currently part of my mapping looks like this:

   "mappings" : {
      "properties" : {
        "field_1" : {
          "type" : "text",
          "index_options" : "offsets"
        },
        "field_2" : {
          "type" : "text",
          "index_options" : "offsets"
        },
        "field_3" : {
          "type" : "text",
          "index_options" : "offsets"
        },
        "field_4" : {
          "type" : "text",
          "index_options" : "offsets"
        },

So i don't think it'll be stored twice as keyword and full text, does it? If it is, how to modify that to store only once because it doesn't look obvious from the current mapping.

   "field_example" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }

It would have looked like this for the field you are storing if it was also type keyword... it's only text... i dont know how to help

The only solution that come to my mind is to split those field in two indexes and assign something like a fingerprint to make easier to search them... but i don't think this is a best practice

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.