I have been playing with ELK and setup Elasticstack to collect my personal webserver access logs. There will be more to come but I though this would be a good starting point.
I have installed Filebeat to monitor access logs and output to Logstash on another Ubuntu server – Working .
I have installed Logstash to input, filter and enrich the weblogs (grok apache and geoip filter I think), and output to Elasticsearch on another Ubuntu server – Working .
I have installed Elasticsearch to index the weblogs. This is where it get strange. It appears it is working as there are no errors (the connection established and pipeline is working). When I do a search for the indices I can see the index is created and the “docs count” increases as people visit the site. However, when I search to “match all” I can only see a handful of events. It is as if it worked to ingested logs at some point but stopped or it is still working and the docs are there but not “extracted” – I have no idea.
I have installed Kibana on the same server that is running Elasticsearch. I can access the web interface but there is no data to discover. The strange thing is I can import one of the sample datasets provided with Kibana and search/visualise that data and that data is now available in an index in Elasticsearch so I appears communication is working. The only thing I can see that is different is the sample data is defined as “green” and the other data I have been sending in defined as “yellow”. – Something is wrong.
Don't be alarmed by the "yellow" state of some indices. That simply means that you are using the (default) setting of 2 copies of the data, but as long as you only have a single Elasticsearch node you'll only have a single copy.
In Kibana, if you go to Management -> Index Management, do you see all the data that you would be expecting?
Also in Management, add your indices under Index Patterns so you can access them in the Discover view.
I'm not sure about the "extracted". Your documents are there, but the fields are not correct? Then it's a problem with your grok pattern.
PS: For Apache2 logs, there is also a Filebeat module that might be easier to get started with. Then you won't even need Logstash (for this use case).
So in Index Management in Kibana I do see the index and I was also able to add the index patern and then discover the data. Success!
The next questions are:
How and where do I define an index for the weblogs being sent from Filebeat. Currently they are arriving as logstash-[thedate]; and
Filebeat and Logstash seem to run in the foreground and stop when I log out of an interactive termina. Elasticsearch and Kibana services are running OK.
I also notice that there is 2 timestamp fields. One appears to be the timestamp of the event the other is the timestamp of the event getting indexed. Timestamp (13/Feb/2019:22:24:47 +0000 ) verses @timestamp (February 14th 2019, 14:13:14.914). I can only set @timestamp as the time field which is not a true representation of events. Is it possible to use Timestamp not @timestamp?
logstash-YYYY.MM.DD is the default pattern for Logstash, but you could change that in the Logstash output to Elasticsearch.
Both Logstash and the Beats will also run as a service. How did you install them? On Linux I'd use the DEB or RPM packages.
Probably the timestamp you cannot pick is not a proper timestamp from Kibana's point of view. Check in the index pattern what data type you have and from there you can work back where the mapping or parsing is wrong.
Just a word of caution: Having the time in the index name makes deleting data trivial (once you hit the limit of how much data you can or want to keep around). If you don't have so much traffic you could go with a monthly index, so web_logs-YYYY.MM for example. Having one massive index would make it very expensive to expire data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.