Hello,
I've been looking for an answer for a couple days and hopefully someone here can help.
I am new to Elasticsearch and love the concept and function. I am going to use it for a project that will be collecting some system status information for analysis.
Most of the JSON coming in will be pretty fixed in terms of key values for the headers. However, the results data will be very different at times. I ran into my first problem when I started sending in data to test out mapping and I got an indexing error when I hit 1000.
I had a mapping explosion. I know what went wrong. What I'm looking to do is re-structure the incoming data to avoid this situation.
The problem is outlined below. Basically I have a results key, and under that key I returned json that has varying keys. So systems_logs, ... will be different.
In one case, I returned system PIDs and the path to the process for another kind of metric ("pid":"process_path"). e.g.
"444":"/sbin/system_daemon",
"6583":"/usr/local/bin/user_daemon_1",
"14567":"/usr/local/bin/user_daemon_2"
etc.
You had a lot of PIDs as keys and of course on a Unix host you can have 65k+ so the index was growing way too large. This also of course makes mapping impossible to go in and put in all of these possibilities.
What I'm looking to do is come up with a clean way to make the results available, but not lose the ability to have unique key: pair values.
I know that is not really possible with Elastic to have unique keys constantly flowing in. So, I'm looking for a workaround that will give me some way to flatten this kind of result so it is more elastic friendly.
A sample JSON is below to show what is coming in now. Any recommendations on how I can change the input JSON for efficiency and search ability is appreciated. Keep in mind under the "logs" heading that the list of log files can be large and ever changing which is not going to work well the way I have it now.
Thank you for all the work on Elastic.
{
"header": {
"status": "ok”,
“status_msg": "ok",
"ip_addr": "192.168.1.1"
},
"data": {
"status": "ok",
"status_msg": "ok",
"results": {
“logs”: {
“system_logs”: [
“/log/syslog”
],
“www_logs”: [
“/log/www/www_log”,
“/log/www/www_log.2”,
“/log/www/www_log.3”,
“/log/www/www_log.4”
]
}
},
"name": "log_grabber”,
"start_time": "2017-05-24T21:40:18.455588Z",
"end_time": "2017-05-24T21:40:18.481798Z"
}
}