Truncate keyword to specific length and store

Hello,

I'm trying to retrieve only 128 characters of a text field I'm storing due to the fact that the text field is often extremely large, but for preview purposes I'd still like to retrieve part of that field without significant delay over slower connections.

Truncate filter seemed to match, but after testing it on the multi-field, the stored value seems to be the same as the original value.

   "truncate_keyword_analyzer": {
              "type": "custom",
              "tokenizer": "keyword",
              "filter": [
                "truncate_filter"
              ]
            }

And

"content": {
          "type": "text",
          "norms": false,
          "analyzer": "standard",
          "fields": {
            "preview": {
              "type": "text",
              "store": true,
              "index": false,
              "norms": false,
              "analyzer": "truncate_keyword_analyzer"
            }
          }
        }

When the content field has, for example: "All I want for Christmas is a truncated text field", the content.preview field would have "All I want for Christmas", but it doesn't.

Is there something wrong with the current setup ? Should I look into other filters or tokenizers ? Or is this just not possible at the moment ?

Thanks!

We never modify source, the analyzer is used to produce the terms for building the index but the source document goes into _source unchanged. If you want source modified, you have to pre-process your document either client-side or using an ingest pipeline. Alternatively, if you use copy_to and store the copy_to field, you can obtain it from the index.

So when I use an analyzer on a multi-field and set store: true, the stored field is the same as the _source field ?

I don't want the _source modified, I assumed using a sub-field and retrieving it using stored_fields would retrieve the terms processed by the analyzer if store is set to true

I'll look into copy_to as well, thanks!

No, stored fields does the same thing: it stores the original value of the field, not the result of the analysis chain. In fact, as a result of your question I looked closer at my copy_to suggestion and I regret to say that I am mistaken, what I suggested can not be done. Instead, you have to use my initial suggestion (do this client side up front, or in an ingest pipeline). Alternatively, you can do it client side when you fetch the field, or you can use a script field.

Good to know! Thanks for helping me with this and also providing me with alternative suggestions!

You are very welcome.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.