Howto find docids based on field type rather than field value?

when developing a new logstash filter, I initially by mistake got fields mapped as string rather than number, now I want to search out all the docIDs where specific fields are of type string rather than numbers so I can search my docs from Kibana and avoid this warning from Kibana:

Mapping conflict! 6 fields are defined as several types (string, integer, etc) across the indices that match this pattern. You may still be able to use these conflict fields in parts of Kibana, but they will be unavailable for functions that require Kibana to know their type. Correcting this issue will require reindexing your data

Q is just howto do this, hints are much appreciated!

Hi @stefws,

you don't need to find the individual doc ids but rather the index names as all documents in an index will be indexed using the same mapping. Hence, you need to use the mapping API to find out which indices are affected.

Here is an example: Say, your index pattern is "logstash-*", then you can find the mappings of all indices with

GET /logstash-*/_mapping 

Now, there is a nice little utility called gron which you can use to grep for the fields that you're interested in. Say, the field is called foo, then you can issue:

gron http://localhost:9200/logstash-\*/_mapping | fgrep 'foo.type = "string"'

which produces something like:

json["logstash-2016"].mappings._default_.properties.foo.type = "string";
json["logstash-2016"].mappings.my_type.properties.foo.type = "string";

From which you can see that the affected index is logstash-2016. If it's just a couple of indices this is probably manageable, otherwise you should postprocess the result with ask or the like.

You can also try jq but it was easier for me with gron in this case.

When you know the affected indices, you can then create a new index with a proper mapping and reindex all documents.

Daniel

Thanks, got jq, but it seems the mapping is primary dynamic:

{
  "collectd-2016-08-26": {
    "mappings": {
      "_default_": {
        "_all": {
          "enabled": true,
          "omit_norms": true
        },
        "dynamic_templates": [
          {
            "template1": {
              "mapping": {
                "ignore_above": 64,
                "index": "not_analyzed",
                "type": "{dynamic_type}",
                "doc_values": true
              },
              "match": "*"
            }
          }
        ],
        "properties": {
          "@timestamp": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "collectd_type": {
            "type": "string",
            "index": "not_analyzed"
          },
          "host": {
            "type": "string",
            "index": "not_analyzed"
          },
          "plugin": {
            "type": "string",
            "index": "not_analyzed"
          },
          "plugin_instance": {
            "type": "string",
            "index": "not_analyzed"
          },
          "type_instance": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}

The reason for finding the docs was I just wanted to remove the few initially created doc which has the field wrongly mapped as strings. Also it puzzles me that logstash grok filters matching %{INT:field-name} ends up as strings. I fixed this with a logstash ruby plugin section doing v_to_i on such fields.

Any good pointers for reindexing an index?

Hi @stefws,

You have dynamic mapping enabled, yes. Below properties you see which properties are defined in the mapping for the index collectd-2016-08-26.

All documents in an index will have the same mapping (except for very few exceptions see user docs).

I cannot really say much about that but maybe somebody in the Logstash forum can clear up this confusion.

If you're on Elasticsearch 2.3 or above you can use the reindex API. You should also consider using aliases, so you can change the underlying index name transparently for the application (see the Definitive Guide).

Daniel

Only get the few above properties when asking for _mapping (all output shown) though I got many more fields in the index. How could I change the mapping for an existing index, pointers?

I'm on 2.3.5

 # rpm -q elasticsearch
 elasticsearch-2.3.5-1.noarch

Thanks, will looking into the reindex API and aliases...

Will take the grok question to logstash forum of course :slight_smile:

I managed to create a new index with correct mapping and reindex data into to this, and removed old index and aliased it's name to the new created index, thanks.

Also it turned out there was a bad date plugin format specifier in my new logstash filter, so @timestamp became wrong and thus docs were put in wrong indices :slight_smile: