Using Pattern Replace to update field


(Matthew Bullock) #1

I am trying to use pattern replace to change a number reference to a domain:

url -XPUT "http://localhost:9200/timeordered?pretty" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.mapping.ignore_malformed": false,
  "index": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "char_filter": [
            "my_char_filter"
          ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "mapping",
          "mappings": [
            "45941455 => domain.de",
            "45941671 => domain.es",
            "45941287 => domain.com",
            "45941554 => domain-domain.fr",
            "48031837 => domain1.com",
            "13042264 => domain-cloud.com",
            "13042207 => domain3.com",
            "13157590 => domain4.com",
            "13057180 => domain5.com",
            "15076396 => domain6.cl",
            "13133866 => domain7.com",
            "15076060 => domain8.com.ar",
            "15076411 => domain9.com.au",
            "15076303 => domain10.com.co",
            "15076393 => domain11.com.mx",
            "15076408 => domain12.es",
            "15076405 => domain13.fr",
            "15076402 => domain14.jp",
            "13040731 => domain15.com"
          ]
        }
      }
    }
  }
  },
  "mappings": {
    "logs": {
      "properties": {
        "EdgeStartTimestamp": {
          "type":   "date",
          "format": "epoch_millis"
        },
        "geoip.location": {
          "type": "geo_point"
      }
        }
      }
    }
  }'

However when I push data via the bulk api the field is not being converted.

Now I am guessing I either have to add this to my bulk api format or call it in the mapping of the field i need reindexing this being:

curl -s -XPUT 'http://localhost:9200/timeordered/_mapping/**ZoneID**?pretty' -d'
{
      "properties": {
        "ZoneID": {
          "type": "string",
          "index": "not_analyzed"
        }
      }

The bulk api format i have (not this is JQ that processes the file to bulk import)

{"index": {"_index": "timeordered", "_type": "logs", "_id": .ID, "pipeline": "geoip-timeordered"}},

Any advice would be great!

Thanks


(Abdon Pijpelink) #2

I'm assuming it's the ZoneIDfield you wish to apply this pattern replace filter to?

For starters, your ZoneID is mapped as not analyzed, so it will not get any analyzer applied to it. What you need to do is change "index": "not_analyzed" into "index": "analyzed" in the mapping for ZoneID (or omit the "index" line completely, as "analyzed" is the default).

Next, you need to tell Elasticsearch that you want to apply this my_analyzer analyzer to this field, instead of the default standard analyzer. So, in your mapping for ZoneID you need to add a line "analyzer": "my_analyzer".

Be aware though, by making ZoneID analyzed you may run into memory issues when you aggregate or sort on this field.

Also, be aware that any analysis will not change the _source of your documents. So, Elasticsearch will return you the original number references with the search results. Analysis will only influence the internal values that Elasticsearch will use for queries and aggregations.

If you'd like to change the _source documents themselves, you'd have to use the ingest node instead of a character filter. It looks like you're already doing that, with the "pipeline": "geoip-timeordered" directive in your bulk request. You could do the tranformation in that pipeline, if you'd like to change the ZoneID in the _source itself.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.