Terms splits up my data


(Niraj Kumar) #1

When using terms , the kibana 3 splits up my data into chunks. How can i get rid of this.

Example :

I have a key field with value "drdvd/win10/x64/829243-b21.wim" . When using terms to filter out the key with the related valued it divides into chunks like "drdvd" "win10" "x64" . How do i fix this. I have already ingested the data.


(Shaunak Kashyap) #2

This is almost certainly happening because Elasticsearch, by default, analyzes the text in string fields. You can learn more about analysis here: https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html.

If you used Logstash to ingest your data, there should be a "raw" version of you field. So if your field was named "foo", you would have a "foo.raw" field. This "raw" field would contain the unanalyzed string. If you perform the terms aggregation on it, you will get the desired effect.

If you are not using Logstash, you will need to reindex your data. Before reindexing, make sure to create a mapping for your (new) index and specify "index": "not_analyzed" for that field. You can read more about mappings here: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html


(Niraj Kumar) #3

Thanks for the answer shaunak. I ingested using logstash but changed the index name to a desired one but not use the default logstash-YYYY one. I do not see any .raw fields.

So if if use the mapping API, do i need to re-index the data?


(Shaunak Kashyap) #4

Ahh, if you changed the index name you won't get the raw fields (by default). So at this point it looks like reindexing (after defining a mapping for the new index as I explained earlier) is your only option.


(Niraj Kumar) #5

Excuse me for a newbie in re-indexing and mapping. Below is what my mapping look like, how do i change this to add "index" : "not_analyzed" and then re-index. Any example would help a lot.

{
"niraj-log-s3-2016.06.28": {
"mappings": {
"s3-access-log": {
"properties": {
"@timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"@version": {
"type": "string"
},
"agent": {
"type": "string"
},
"bucket": {
"type": "string"
},
"bytes": {
"type": "long"
},
"clientip": {
"type": "string"
},
"error_code": {
"type": "string"
},
"host": {
"type": "string"
},
"httpversion": {
"type": "string"
},


(Shaunak Kashyap) #6

As mentioned in the mapping documentation I linked to earlier, you cannot update a mapping once it is defined:

Although you can add to an existing mapping, you can’t change existing field mappings. If a mapping already exists for a field, data from that field has probably been indexed. If you were to change the field mapping, the indexed data would be wrong and would not be properly searchable.

So you will need to delete your existing index and create a new mapping for it. Then you reindex your data into this index the same way you indexed it before. This time around, the new mapping will be applied.

However, there appears to be additional consideration. Based on the name of your index, niraj-log-s3-2016.06.28, it looks like you are using time-based indices. The index name indicates that a new index will be created every day. Since you will want the mapping to apply to these new indices as they are created, your first step should be to define an index template. The template property of the index template contains a pattern and it determines the indices that the template will be applied to. In your case the template property would be something like niraj-log-s3-*.

The index template can contain (amongst other things) the mapping to use for the indices to which the template will be applied. In this mapping, you will need to define "index": "not_analyzed" for the specific string fields you want Elasticsearch to not analyze. Alternatively you can let these specific fields be analyzed by Elasticsearch but create a secondary field that is copied from the original field, but its contents are not analyzed. For this you can use Elasticsearch's copy_to feature. A third option is to define a template where all string fields automatically get a secondary field that is not analyzed. For this you will need to use Elasticsearch's dynamic templates functionality.

Once you've created the template in Elasticsearch (using one of the three options mentioned in the previous paragraph), you will need to delete your existing niraj-log-* indices in Elasticsearch and reindex your data from the original source using Logstash, just like you did the first time around. This time around, Elasticsearch will find the index template matching your index names and apply the mapping in it to your indices. The result will be that you will have not analyzed string fields (or both analyzed and not analyzed string fields, if you go with the 2nd or 3rd option from above).

BTW, Logstash by default implements option 3 if you simply use the default Logstash index names (logstash-*). But because you are using custom index names you are having to implement it (or one of the other two options) manually.


(Niraj Kumar) #7

I created the below template and i see some error in elasticsearch log preventing it to be ingested.

{
  "template" : "niraj-log-s3-new-*",
  "settings" : {
    "index.refresh_interval" : "5s"
  },
  "mappings" : {
    "_default_" : {
      "_all" : {"enabled" : true, "omit_norms" : true},
      "dynamic_templates" : [ {
        "message_field" : {
          "match" : "message",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "string", "index" : "analyzed", "omit_norms" : true,
            "fielddata" : { "format" : "disabled" }
          }
        }
      }, {
        "string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "string", "index" : "analyzed", "omit_norms" : true,
            "fielddata" : { "format" : "disabled" },
            "fields" : {
              "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
            }
          }
        }
      } ],
      "properties" : {
        "@timestamp": { "type": "date" },	
        "@version": { "type": "string", "index": "not_analyzed",
        "geoip"  : {
          "dynamic": true,
          "properties" : {
            "ip": { "type": "ip" },
            "location" : { "type" : "geo_point" },
            "latitude" : { "type" : "float" },
            "longitude" : { "type" : "float" }
              }
            }
          }
        }
      }
    }
  }
}

Elasticsearch error:

Caused by: MapperParsingException[Mapping definition for [@version] has unsupported parameters: [geoip : {dynamic=true, properties={location={type=geo_point}, longitude={type=float}, latitude={type=float}, ip={type=ip}}}]]
at org.elasticsearch.index.mapper.DocumentMapperParser.checkNoRemainingFields(DocumentMapperParser.java:267)
at org.elasticsearch.index.mapper.DocumentMapperParser.checkNoRemainingFields(DocumentMapperParser.java:261)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:317)
at org.elasticsearch.index.mapper.object.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:228)
at org.elasticsearch.index.mapper.object.RootObjectMapper$TypeParser.parse(RootObjectMapper.java:137)
at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:211)
at org.elasticsearch.index.mapper.DocumentMapperParser.parseCompressed(DocumentMapperParser.java:192)
at org.elasticsearch.index.mapper.DocumentMapperParser.parseCompressed(DocumentMapperParser.java:177)
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:229)
at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:356)
... 6 more


(Shaunak Kashyap) #8

You have a typo on this line:

    "@version": { "type": "string", "index": "not_analyzed",

That object is not complete. You need to close it using }.


(Niraj Kumar) #9

Okay i fixed the template issue and i see the raw fields been created . But i see another issue some of the logs to ES. Few of the logs are successfully getting parsed but for few of them fails to ingest. The directory has full permission as of now. Again thanks so much for helping me with so much of detailed info.

failed to open /home/ubuntu/s3logs/logs/2016-07-10-11-39-37-9FCF416C82339B53: Permission denied - /home/ubuntu/s3logs/logs/2016-07-10-11-39-37-9FCF416C82339B53 {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-06-22-17-28-17-984154E42BE62343: Permission denied - /home/ubuntu/s3logs/logs/2016-06-22-17-28-17-984154E42BE62343 {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-06-24-20-26-14-9ECFD1F286FDEA1C: Permission denied - /home/ubuntu/s3logs/logs/2016-06-24-20-26-14-9ECFD1F286FDEA1C {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-07-14-19-31-16-0428B673B0842C8E: Permission denied - /home/ubuntu/s3logs/logs/2016-07-14-19-31-16-0428B673B0842C8E {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-06-11-23-25-01-55A14E6B81E3FD7E: Permission denied - /home/ubuntu/s3logs/logs/2016-06-11-23-25-01-55A14E6B81E3FD7E {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-06-30-11-22-23-04E6407089E57AC4: Permission denied - /home/ubuntu/s3logs/logs/2016-06-30-11-22-23-04E6407089E57AC4 {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-06-26-04-46-05-17C629D237DAFB36: Permission denied - /home/ubuntu/s3logs/logs/2016-06-26-04-46-05-17C629D237DAFB36 {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-06-23-01-34-49-928605F20BE0534E: Permission denied - /home/ubuntu/s3logs/logs/2016-06-23-01-34-49-928605F20BE0534E {:level=>:warn}
failed to open /home/ubuntu/s3logs/logs/2016-07-08-20-45-24-0F4F0837E9CCBE95: Permission denied - /home/ubuntu/s3logs/logs/2016-07-08-20-45-24-0F4F0837E9CCBE95 {:level=>:warn}

My Permissions:

ubuntu@elk-prod-02-app01:~/s3logs$ ls -lrt
total 1724
drwxrwxrwx 2 ubuntu logstash 1761280 Jul 19 18:21 logs


(Shaunak Kashyap) #10

Hmmm... I'm not sure why this wouldn't work. Can you run ls on a file that does get parsed and ingested successfully and one that doesn't? Seeing them "side by side" might reveal something.


(system) #11