Trim all text fields when indexing

Hi,
I would like all text fields to be SAVED in elastic without leading and trailing spaces.
(That is, in _source)
Pipeline didn't work, but also is not good enough, because I don't want to add each field to it.

How can I do it?

Thanks.

Hi,

can you please explain why you want to do that? The whitespace in _source should compress well (so it should not take up much disk space), and for queries it should not matter because the analyzer will ignore them. You can try, for example:

curl -X POST "localhost:9200/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer": "standard",
  "text": "        Text with leading and trailing whitespaces         "
}
'

This produces the following response (only shown partially):

{
  "tokens" : [
    {
      "token" : "text",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "with",
      "start_offset" : 13,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    ...
    {
      "token" : "whitespaces",
      "start_offset" : 39,
      "end_offset" : 50,
      "type" : "<ALPHANUM>",
      "position" : 5
    }
  ]
}

You can see that the analyzer noticed that the actual text is starting at offset 8 but the whitespace is not part of the analyzed field.

Daniel

Thanks Daniel.
But I want the result to show up when looking for an exact match

Hi,

can you please explain what you mean by an "exact match"? My understanding of "exact match" is that if you index the text (note the whitespace characters)

"   Text with three leading and three trailing whitespace characters.  " 

and search for "Text with three leading and three trailing whitespace characters." there is no match (because in the search you did not specify the three leading and trailing whitespace characters but only the text). If that is the case you should index it as a keyword.

However, to me this contradicts what you ask for in your original question:

So if you want to query for exact matches (according to the example that I've provided above) you need the whitespaces. If you want to get rid of the whitespaces, you should instead use the text datatype (see my example in the previous answer).

Can you please explain what you want with a specific and self-contained example scenario?

Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.