Apply a lowercase on indexed data


(Melvyn Peignon) #1

I have indexed documents in my elasticsearch. A sample document look like this:

{
    "_index": "processed_tweets",
    "_type": "processed",
    "_id": "830403820580663296",
    "_score": 1,
    "_source": {
      "at": [
        "@LouisDasch"
      ],
      "original_tweet_id": "830398288352403457",
      "id_str": "830403820580663296",
      "trigrams": [
        "blessed lourdes lady",
        "lourdes lady feast",
        "lady feast day",
        "feast day wishing"
      ],
      "hashtags": [
        "#Catholic"
      ],
      "id_tweet_creator": "487735029",
      "tokens": [
        "blessed",
        "lourdes",
        "lady",
        "feast",
        "day",
        "wishing"
      ],
      "bigrams": [
        "blessed lourdes",
        "lourdes lady",
        "lady feast",
        "feast day",
        "day wishing"
      ],
      "retweeted": true
    }
  }

I would like to lowercase all the hashtags present in the field "hashtags" for all the document I have indexed. For example I would have: "hashtags": ["#Catholic"] -> "hashtags": ["#catholic"] What is the best way (lesss time consumming) to update every keywords to their lowercase equivalent (conserving the "#")?


(Mark Walkom) #2

You will need to use the reindex API and build your own analyser to lowercase things.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.