What the correct way to use dynamic object


(amit) #1

Hi,
My index looks like

indexDoc = {
"mappings": {
    "test": {
        "properties": {
            "timestamp": {
                "type":
                "date",
                "doc_values":
                True,
                "format":
                "strict_date_optional_time||epoch_millis||dd/MMM/YYYY :HH:mm:ss Z"
            },
            "created_on": {
                "type": "date",
                "index": "not_analyzed",
                "doc_values": True,
                "format": "date_optional_time"
            },
            "logmsg": {
                "type": "object",
                "dynamic": True
            },
            "logtype": {
                "type": "string",
                "index": "not_analyzed",
                "doc_values": True
            },
        }
    },
    "_default_": {
        "_all": {
            "enabled": True,
            "norms": {
                "enabled": False
            }
        },
        "properties": {
            "timestamp": {
                "type":
                "date",
                "doc_values":
                True,
                "format":
                "strict_date_optional_time||epoch_millis||dd/MMM/YYYY:HH:mm:ss Z"
            }
        },
        "dynamic_templates": [{
            "string_fields": {
                "match": "*",
                "match_mapping_type": "string",
                "mapping": {
                    "type": "string",
                    "norms": {
                        "enabled": False
                    },
                    "fielddata": {
                        "format": "disabled"
                    },
                    "fields": {
                        "raw": {
                            "type": "string",
                            "index": "not_analyzed",
                            "doc_values": True
                        }
                    }
                }
            }
        }, {
            "other_fields": {
                "match": "*",
                "match_mapping_type": "*",
                "mapping": {
                    "doc_values": True
                }
            }
        }]
    }
},
"settings": {
    "refresh_interval": "5s",
    "number_of_shards": 5,
    "number_of_replicas": 2
}

}

I have elastricsearch and kibana only. I am not using logstash.

Basically I need to capture full log file in logmsg.
In python I can only use 32000 char in the string, so I am breaking my logfile msg into chunks of 32000.

my final list looks like

output = [ {'created_on': 'sometime', 'logtype': 'sometype', logmsg: { '_msg1: 'aaa', _msg2:'bbb', _msg3': 'ccc'}}]

I am using bulk api to store the values.

Is this the correct way of doing it, or can I achieve this result in some other way.
Problem is in kibana discover tab, in selected field I get infinite field.

logmsg._msg1, logmsg._msg2, logmsg._msg3---------


(Mike Simos) #2

Hi,

What result are you trying to achieve? You can either index the field as one long message field. Or potentially break up the message into separate fields depending on what your log message looks like.


(amit) #3

WIll like to index the field as one long message filed.


(amit) #4

How can I have one filed as logmsg instead of haveing logmsg._msg_0, logmsg._msg_1 as multiple field.


(amit) #5

Please help. I guess I am creating my index wrong. Basically I need to capture full log file as a field, no need to analyze that field. Normally my log file contains some 80k lines. as a result I was breaking the line as creating dynamic field.
As a result kibana was timing out in fetching values more then 1 day.

How do i create my index correctly. I am using python and doing bulk operation,.


(Christian Dahlqvist) #6

Why do you need to have the entire log file as a single event? What are you trying to achieve?

The usual way to index log files is to do so line by line as separate documents. If there are lines that belong together, these can be grouped together as a single event through multiline processing.


(amit) #7

I was doing that before, but team want to check full logs as a single event.


(Mike Simos) #8

Hi,

Logstash already does this for example with the message field. Also Elasticsearch has the _all field which is effectively a concatenation of all your fields. I am not sure what the issue is with python, I would check your mapping since you should be able to index a single value more then 32k, here's indexing a single field of 50,000:

$ cat bulk.py                                                                                                                                                                         
#!/usr/bin/python                                                                                                                                                                                        
                                                                                                                                                                                                         
from elasticsearch import Elasticsearch                                                                                                                                                                  
value = ""                                                                                                                                                                                               
for i in range (0, 50000):                                                                                                                                                                               
   value += "a"                                                                                                                                                                                                                             
                                                                                                                                                                                                                                            
client = Elasticsearch(hosts=['http://elastic:changeme@localhost:9200'])                                                                                                                                                                    
body = '{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }\n'                                                                                                                                                              
body += '{ "field1" : "'+value+'" }\n'                                                                                                                                                                                                      
response = client.bulk(body=body)                                                                                                                                                                                                           
print response

$ python bulk.py                                                                                                                                                                          
{u'items': [{u'index': {u'status': 200, u'_type': u'type1', u'_shards': {u'successful': 2, u'failed': 0, u'total': 2}, u'_index': u'test', u'_version': 7, u'created': False, u'result': u'updated', u'_id': u'1'}}], u'errors': False, u'took': 22}

(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.