Issue with custom analyzer

Hello Im having an issue while using this custom analyzer, what I want to do is to change a numeric range within an field and make it to string i.e:

          "0 => very_negative",
          "1 => negative",
          "2 => neutral",
          "3 => positive",
          "4 => very_positive"

To do this I've implemented a custom analyzer but It is not working. This is the template that Im using and how I was trying to resolve this

PUT _template/twitter-live
{
  "index_patterns": [
    "twitter-*"
  ],
  "settings": {
    "index": {
      "number_of_shards": "1",
      "analysis": {
        "analyzer": {
          "nlp_ca": {
            "type": "custom",
            "char_filter": [
              "sentiment_scale"
            ],
            "tokenizer": "keyword"
          }
        },
        "char_filter": {
          "sentiment_scale": {
            "type": "mapping",
            "mappings": [
              "0 => very_negative",
              "1 => negative",
              "2 => neutral",
              "3 => positive",
              "4 => very_positive"
            ]
          }
        }
      },
  "number_of_replicas": "0"
    }
  },
  "mappings": {
    "tweets": {
      "properties": {
        "@timestamp": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "text": {
          "type": "text"
        },
        "user": {
          "type": "object",
          "properties": {
            "description": {
              "type": "text"
            }
          }
        },
        "coordinates": {
          "type": "object",
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        },
        "entities": {
          "type": "object",
          "properties": {
            "hashtags": {
              "type": "object",
              "properties": {
                "text": {
                  "type": "text",
                  "fielddata": true
                }
              }
            }
          }
        },
        "sentiment":{
          "type": "text",
          "analyzer": "nlp_ca"
        },
        "nlp": {
          "properties": {
            "sentences": {
              "type": "keyword"
            },
            "sentiment": {
              "type": "text",
              "analyzer": "nlp_ca"
            },
            "tokens": {
              "type": "keyword"
            }
          }
        },
        "retweeted_status": {
          "type": "object",
          "properties": {
            "text": {
              "type": "text"
            }
          }
        }
      },
      
      "dynamic_templates": [
        {
          "string_template": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ]
    }
  }
}

While testing with the API It's works

GET twitter-2019.02.15/_analyze
{
  "analyzer": "nlp_ca",
  "text": "0"
}

{
  "tokens" : [
    {
      "token" : "very_negative",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "word",
      "position" : 0
    }
  ]
}

The data that Im ingesting with logstash does not get through the analyzer and the nlp.sentiment field is not changed

t  nlp.sentiment	       	1

Please let me know what Im doing wrong

Thanks in advance

Hello!

Can I ask what you expect to see? Analyzers only change how data is indexed and searched, not how it's returned, and analyzed text generally isn't visible except through the _analyze API. So even if your analyzer is working exactly how you expect, the values you see in _source won't be changed by the analyzer - only indexing and searching are changed.

If you want to change the text itself, rather than how that text is indexed, I'd recommend doing that either in Logstash (which it sounds like you're already using) or using an ingest pipeline in Elasticsearch, rather than a custom analyzer.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.