I don't see any new index being created in the cloud, if i remove the index line from the output, i get my data in the default logstash-YYYY.MM.DD index but this does not let me split my indicies on a perserver basis (not to mention that we don't want to have a new index for each day).
I cannot see any reason why the defined index would not be created. Are there any errors in your Logstash logs? Technically speaking, Logstash does not create indices. It sends a batch of documents to Elasticsearch and says, "these should go in an index named (whatever is defined in index =>)," and then Elasticsearch handles the bulk request and creates the index.
On a parallel note, though:
How are you planning on handling data retention? Doing a delete_by_query is very taxing to Elasticsearch for time-series data. You're actually much better off doing rollover indices of some kind, whether named, or with the Rollover API.
Also, what is the benefit of splitting indices by server name? Is the data completely different, such that the mappings will also differ significantly? It's simple to filter by server name in a query, so unless there is a mapping difference due to differing data, there's no other compelling reason to do this. More shards and indices is just more overhead for the cluster to manage. It will be more performant if you can reduce this to the least amount required.
I have turned on full debugging log levels on my logstash and see no errors in logstash, i do know that the outputs are getting invoked because i added a file output to make sure that logstash was pulling from the files. Is there another way (you can probably guess i'm pretty new to elk) that i can create the indicies?.
I can't go into the reasons why we need to split the indices, there actually is a good reason (i'm using per server in an abstract sense), but yes there is a reason why we can't split by a simple query by "server name".
Are new entries continually being added to these files? If so, then the data is not being rejected by Elasticsearch. Logstash, at least in the current release, writes to all outputs in a given pipeline at the same time. If one of them puts up back-pressure (e.g. Elasticsearch is not accepting the output for any reason), then all of the outputs in that pipeline will also cease.
Fair enough. But that doesn't rule out data retention planning by dropping old indices based on the age of the contents (naming the indices server_name-YYYY.MM.dd, for example, or using the creation_date of the index). It's still not best practices to use delete_by_query to empty out indices for time-series data.
yes i continually add data to the files. I've got the methodology for how we will handle retention, i'm simplifying the situation to focus on the one issue that I can;'t seem to get new indicies to be created based on what I believe should be correct in my configuration.
I mean, is Logstash continually outputting to the file outputs you defined? I wasn't able to infer the answer from your response. If the data is continually streaming into those output files, then the data must be in Elasticsearch somewhere, otherwise Logstash would have stopped sending data, and would have log entries about retries (429 code) and such.
yes the file output that i defined continues to get the data in it as the files that i'm using as inputs get data added to them. yes i'm using the default elastic user
This is odd. Could you provide the output of the cluster stats API? I just want to check if there are any issues with the cluster that could be causing this.
I will have to do that tomorrow, as i'm not in the environment to reproduce this for the rest of the day. I'll have to figure out how to get to that api, so i'll work on that today
I'm not sure how to use the cluster stats API I looked at the Paramedic console and it says my cluster is "yellow" but i have no idea what that means. Since I'm on the evaluation version of elastic.co cloud, maybe i should just delete/remove my cluster and start over
Log in to Kibana and go to Dev Tools. This will open up Console, which allows you to run queries. Here you can enter and run GET _cluster/stats on the left side. Results will show up on the right.
so i went in and deleted the logstash-* indices and ran my generator that will put records out, and the data is getting put into the logstash-* index.. I wonder if this is related to templating or some other thing that I don't fully understand
I took the "accounts.json" example in this https://www.elastic.co/guide/en/kibana/current/tutorial-load-dataset.html and ran the appropriate curl command as documented at the page, and it created the index no problem (i'm relying on logstash to convert my csv into json, so i need to tweak my config to get my actual data, which i'm going to try next
Ok I'm not sure I'm doing this right. I did
I did a query on the logstash-2017.10.12 index and got all the records. I copied one record. and did
PUT /BluVector_Bro
{
"_index": "logstash-2017.10.12",
"_type": "BRO_httplog",
"_id": "AV8SXb1_UJ-u3V4Cn7CH",
"_score": 1,
"_source": {
"path": "/var/log/bro/current/http.log",
"@timestamp": "2017-10-12T20:54:10.178Z",
"@version": "1",
"host": "bvesx25.vm",
"message": """1507841650.178409 Cd4IHe2auw3y1WU2Vi 227.173.242.206 1056 159.154.119.78 80 1 GET diggstatistics.com /flash/dialog_header_red.jpg http://diggstatistics.com/ Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) 0 7197 200 OK - - - (empty) - - - - - F7vXEf3FIgfyod7DU3 image/jpeg""",
"type": "BRO_httplog",
"ts": """1507841650.178409 Cd4IHe2auw3y1WU2Vi 227.173.242.206 1056 159.154.119.78 80 1 GET diggstatistics.com /flash/dialog_header_red.jpg http://diggstatistics.com/ Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) 0 7197 200 OK - - - (empty) - - - - - F7vXEf3FIgfyod7DU3 image/jpeg""",
"tags": [
"_geoip_lookup_failure"
]
}
}
The data is lines from bro logs that are filtered in logstash to create a csv formatted file. I get
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "unknown setting [index._id] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
}
],
"type": "illegal_argument_exception",
"reason": "unknown setting [index._id] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
},
"status": 400
}
I started removing the items that caused this exception, but pretty much everything ended up having to be removed. I'm wondering if this is because I don't have a template/mapping for this index (although I would expect to find something in the logstash logs that elastic threw an exception - but maybe i'm wrong)
So tomorrow, I'm going to figure out how to do a mapping and see if that improves anything.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.