How can I make the string field not_analyzed?


#1

I am using logstash 1.5.1 and elasticsearch 1.7.3.0. I used logstash elasticsearch output to index the records residing in a bunch of csv files, and used my own mapping document where I set strings to be not_analyzed, also set logstash default template match"*" as string not_analyzed. In kibana I also verified that string fields are not_analyzed, however, when I use kibana bar chart to create bucket on string field, the string is broken down into tokens.

As you can see for example the "path" field, in kibana mapping details, it is not_analyzed as what I set

Also, the value of "path" field is as follows, it is the path of csv files

Then when I use bar chart to do bucket based on "path" field you can see the legend, the "path" field values are broken down into tokens. Instead of "/.../testcsvimport/record220k-100_3.csv", it is broken down to "testcsvimport" "record220k" "100" "csv"...

I don't want it to be analyzed, I want to keep the whole path field as one string, how can I do it?

I have attached logstash .conf file that i used to export to elasticsearch index, please help me.

input {

file {
path => "/home/myfolder/testcsvimport/record*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}

}

filter {

csv {
columns => ["some_column_names"] # the "path" field is added by csv filter
}

grok {
	match => { "record_IP" => "%{IP:clientip}" }
}
  
geoip {
    source => "clientip"
}

mutate
{
  remove_field => [ "message", "host" ]
}

}

output {
elasticsearch {
host => "dev-elkstack:9200"
protocol => "http"
index => "mt_joined_record_index"
template_name => "mt_joined_record_type"
manage_template => false
}

}


(Mark Walkom) #2

Can you paste/link to your mapping?


#3

PUT /mt_joined_record_index
{
"mappings": {
"mt_joined_record_type": {
"_all": {
"enabled": false,
"omit_norms": true
},
"properties": {
"@timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"@version": {
"type": "string",
"index": "not_analyzed"
},
"path": {
"type": "string",
"index": "not_analyzed"
},
"record_time": {
"type": "date",
"format": "yyyy-mm-dd HH:mm:ss || yyyy-mm-dd hh:mm:ss Z"
},
"tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}

}


#4

In kibana it already showed that the "path" field analyzed as "false" as the picture I posted here

How come in bar chart the field is still analyzed? It is not consisitent.


(Tin Le) #5

Did you reload your field list after making the change? It could be cached.

Tin


#6

Every time when I did new experiment, I changed index name and mapping name to new names, I believe that would be clean experiment, will not get affected by the previous experiments. I also know this mapping I created is effective, because the date type I defined in the mapping is correctly recognized by kibana.


(Tin Le) #7

Sure. Just for grins, would you mind trying the reload fields anyway?

Let's eliminate that.


#8

Yes I did that, it still doesn't work.


#9

Can you give me a example or a link, how other people succeeded in making string fields not_analyzed? Thank you very much!


(Tin Le) #10

The index mapping you shown above seem to come from a PUT. Could you please post a mapping for the current index you are having problem with? from a GET?

Something like from similar command.

curl localhost:9200/logstash-YYYY.MM.DD/_mapping?pretty


(Marcin Kubica) #11

Sorry can't tell what's wrong in your case @sharon.c however I'm using non analysed fields alot and never had this issue.

Deploy your ELK from scratch and try again?


#12

I think I solved it by using raw field, because I need to do aggregation on that field, I think it is better to just use raw field, instead using not_analyzed.


(Chrisribe) #13

Could you post your solution ?
Having the same issue...
Thanks


(Chrisribe) #14

Found it, here is my mapping solution for mysql-* mappings.
Creates raw fields if the fields is a string less than 256 chars.

{
  "template" : "mysql-*",
  "settings" : {
    "index.refresh_interval" : "5s",
    "analysis" : {
      "analyzer" : {
        "default" : {
          "type" : "standard",
          "stopwords" : "_none_"
        }
      }
    }
  },
  "mappings" : {
    "_default_" : {
       "_all" : {"enabled" : true},
       "dynamic_templates" : [ {
         "string_fields" : {
           "match" : "*",
           "match_mapping_type" : "string",
           "mapping" : {
             "type" : "multi_field",
               "fields" : {
                 "{name}" : {"type": "string", "index" : "analyzed", "omit_norms" : true, "index_options" : "docs"},
                 "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
               }
           }
         }
       } ],
       "properties" : {
         "@version": { "type": "string", "index": "not_analyzed" }
       }
    }
  }
}

Reference (see : logstash index template)


(Sami Bensmida) #15

Hello,

I'm using talend telasticsearch component for the ETL, converting CSV Data to Json in order to load it in elasticsearch, witch that's mean the mapping is generated automatically in background and I cant see the code.

Default mode : All the String fields are ANALYZED I want o change it to NOT ANALYZED.

Ideas please ?

Thank's in advance,
Sami BENSMIDA


(Ravi Teja N) #16

Here is Stackoverflow link for some similar question

curl -XPUT localhost:9200/_template/global -d '{
  "template": "*",
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "strings": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      ]
    }
  }
}'

(Venkat Venkat) #17

Hi ,

I am facing the same issue, I am trying to remove the hostname string field analyzed to non-analyzed, every time I am getting analyzed only, please help me out how to remove analyzed for the hostname.

Please find the below template, I am using.

curl -XPUT 'http://localhost:9200/_template/elasticsearchstats' -d '{
"template": "elasticsearchstats",
"order": 10,
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"default": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "",
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
},
{
"float_fields": {
"match": "
",
"match_mapping_type": "float",
"mapping": {
"type": "float",
"doc_values": true
}
}
},
{
"double_fields": {
"match": "",
"match_mapping_type": "double",
"mapping": {
"type": "double",
"doc_values": true
}
}
},
{
"byte_fields": {
"match": "
",
"match_mapping_type": "byte",
"mapping": {
"type": "byte",
"doc_values": true
}
}
},
{
"short_fields": {
"match": "",
"match_mapping_type": "short",
"mapping": {
"type": "short",
"doc_values": true
}
}
},
{
"integer_fields": {
"match": "
",
"match_mapping_type": "integer",
"mapping": {
"type": "integer",
"doc_values": true
}
}
},
{
"long_fields": {
"match": "",
"match_mapping_type": "long",
"mapping": {
"type": "long",
"doc_values": true
}
}
},
{
"date_fields": {
"match": "
",
"match_mapping_type": "date",
"mapping": {
"type": "date",
"doc_values": true
}
}
},
{
"geo_point_fields": {
"match": "*",
"match_mapping_type": "geo_point",
"mapping": {
"type": "geo_point",
"doc_values": true
}
}
}
],
"properties": {
"@timestamp": {
"type": "date",
"doc_values": true
},
"@version": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
"clusterstatus" : {
"type" : "long"
},
"cpupercent" : {
"type" : "long"
},
"fielddataestimated" : {
"type" : "long"
},
"fielddatalimit" : {
"type" : "long"
},
"freedisk" : {
"type" : "long"
},
"currentstatus" : {
"type" : "string",
"index" : "not_analyzed"
},
"hostname" : {
"type": "keyword",
"index" : "no"
},
"testname" : {
"type": "keyword"
},
"freemem" : {
"type" : "long"
},
"heapold" : {
"type" : "long"
},
"heapsurvior" : {
"type" : "long"
},
"heapused" : {
"type" : "long"
},
"heapyoung" : {
"type" : "long"
},
"loadaverage" : {
"type" : "long"
},
"hostname" : {
"type" : "string"
},
"openfiles" : {
"type" : "float"
},
"threadcount" : {
"type" : "float"
},
"type" : {
"type" : "string"
},
"geoip": {
"type": "object",
"dynamic": true,
"properties": {
"ip": {
"type": "ip",
"doc_values": true
},
"location": {
"type": "geo_point",
"doc_values": true
},
"latitude": {
"type": "float",
"doc_values": true
},
"longitude": {
"type": "float",
"doc_values": true
}
}
}
}
}
}
}'


(system) #18