Using logstash for parsing logs

Hello,

I am new to ELK, I have installed Logstash, Elastic search and Kibana. My use case is as below:
Logstash will read the application log, parse it and send to Elastic Search. Can you please suggest whether I need to write in the application log in a specific way, such that this is readable by Logstash? Also, since Logstash parses the log to make appropriate indexes for Elastic Search, how will Logstash get the tags against which it will parse and create indexes for Elastic search?

Also, I am looking for some documentation for a clustered deployment of ELK.

Regards,
Chinmoy Das

Logstash will read the application log, parse it and send to Elastic Search. Can you please suggest whether I need to write in the application log in a specific way, such that this is readable by Logstash?

No, but logs can be more or less easy to parse. For example, multiline logs are annoying to parse while logs in JSON format are very easy.

Also, since Logstash parses the log to make appropriate indexes for Elastic Search, how will Logstash get the tags against which it will parse and create indexes for Elastic search?

Not sure I understand exactly what you mean, but it's up to you to configure Logstash to parse the log correctly.

Also, I am looking for some documentation for a clustered deployment of ELK.

Elasticsearch is naturally clustered and Kibana is stateless except for the pieces that are stored in an Elasticsearch index so it's Logstash you need to focus on. Logastash instances are (currently) unaware of each other so there are a couple of options. I run Logstash on all ES nodes and use DNS-based load balancing to distribute traffic and drain traffic if a node is unavailable. Buffering messages with a message broker like Redis, RabbitMQ, or Kafka is also common. Then all Logstash instances can pull from the same queue.

Thanks Magnus for the response. Let me try to validate my understanding. If I have 6 nodes of application server(say tomcat), each of whom are writing logs, it is better to post appropriate log entries in a queue ( may be RabbitMQ). A listener to that queue will write in a consolidated log file that will be used by Logstash.

Are you suggesting to keep the logs in JSON format to increase performance?

Our online dashboard needs to render near real time data. So, I need to continuously write to the log files that will be used by Elastic Search from Logstash. If I do this, will the Elastic search index get automatically updated(assuming I am using "file" as input plug-in), or there is any other standard mechanism to continuously update the Elastic Search index?

Logging should never fail, so I wouldn't make the application servers dependent on anything but local files. Use Logstash or another log shipper to read the files produced by the application servers and send them on, either to a broker or directly to the Logstash hub nodes that process all messages (which you technically don't need, but I prefer centralizing the processing and configuration).

Are you suggesting to keep the logs in JSON format to increase performance?

Not so much for performance reasons (even though it's probably faster to deserialize JSON than using regular expressions for parsing plain text) as to avoid having to do parsing, i.e. avoiding having to write grok expressions and deal with multiline messages. Plus, if you use GitHub - logstash/log4j-jsonevent-layout: A prefab PatternLayout for log4j that generates logstash json_event formatted data you get additional details from the log event that usually isn't included in regular text log files.

Our online dashboard needs to render near real time data. So, I need to continuously write to the log files that will be used by Elastic Search from Logstash. If I do this, will the Elastic search index get automatically updated(assuming I am using "file" as input plug-in), or there is any other standard mechanism to continuously update the Elastic Search index?

Log events posted to Elasticsearch will be searchable as soon as the underlying Lucene index has been refreshed, which by default happens every five seconds so that's the kind of delay you can expect when the logging pipeline works well.

Hi Magnus, Can Kibana be configured in a cluster? If yes, can you please share a pointer to the configuration parameters?

The only state that Kibana needs is kept in Elasticsearch, which is clustered. Thus, there's no need to cluster Kibana.

Can you please tell me whether installing and using the ELK stack in production requires any separate licence cost?

Where can I get the list of support cost if any?

Regards,
Chinmoy Das

Elasticsearch, Logstash, and Kibana are open source software that you can use freely at no cost but there are support plans and companion add-on products that carry a cost. See https://www.elastic.co/subscriptions for details.

I am trying to parse a log file with following content:

0 127.0.1.1 9080  [11/Sep/2015:09:57:34 +0530] 8ENSVVR4QOS7X1UGPY7JGUV444PL9T2C3EP PAY RESPONSE_SUCCESS HDFC SUCCESS mobile 1
0 127.0.1.1 9080  [11/Sep/2015:09:57:35 +0530] 8ENSVVR4QOS7X1UGPY7JGUV444PL9T2C3EP PAY RESPONSE_SUCCESS HDFC SUCCESS mobile 1
0 127.0.1.1 9080  [11/Sep/2015:10:05:47 +0530] 8ENSVVR4QOS7X1UGPY7JGUV444PL9T2C3EQ PAY RESPONSE_SUCCESS HDFC CR_FAILED mobile 1
0 127.0.1.1 9080  [11/Sep/2015:10:11:34 +0530] 8ENSVVR4QOS7X1UGPY7JGUV444PL9T2C3ER PAY RESPONSE_SUCCESS HDFC CR_FAILED mobile 1
0 127.0.1.1 9080  [11/Sep/2015:10:16:36 +0530] 8ENSVVR4QOS7X1UGPY7JGUV444PL9T2C3ES PAY RESPONSE_SUCCESS HDFC CR_FAILED mobile 1
0 127.0.1.1 9080  [11/Sep/2015:10:24:57 +0530] 8ENSVVR4QOS7X1UGPY7JGUV444PL9T2C3EU PAY RESPONSE_SUCCESS HDFC SUCCESS mobile 1

I am using the following config file:

input{
	file {
		type => "syslog"
		path => "/home/rsdpp/configurationELK/apache.log"
		start_position => "beginning"
	}
}
filter{
	if [type] == "syslog" {
		grok{
			match => {
				"message" => '%{NUMBER:siteid} %{IP:client} %{NUMBER:port} \[%{HTTPDATE:timestamp}\] %{DATA:txnid} %{WORD:txntype} %{DATA:response} %{WORD:bank} %{WORD:respcd} %{WORD:channel} %{NUMBER:imps}'
			}
		}
		date {
			match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
			locale => en
		}
		geoip{
			source => "client"
		}
		
	}
}
output{
	elasticsearch { 
			host => "localhost"
		      }
}

It is giving _grokparsefailure. Can you please guide on where is it going wrong?

For starters there are two spaces between the port number and the timestamp opening bracket but your grok expression only contains a single space.

Thanks. Now the fields are parsed correctly.
The log file in the last mail (path => "/home/rsdpp/configurationELK/apache.log"), is being continusly updated by my java application, and the purpose is to show an online transaction monitoring dashboard in Kibana. But, whenever the application updates the log file, Logstash console gives the following error and the new rows are not reflected in the search results in Kibana:

A plugin had an unrecoverable error. Will restart this plugin.
  Plugin: <LogStash::Inputs::File type=>"syslog", path=>["/home/rsdpp/configurationELK/apache.log"], start_position=>"beginning", delimiter=>"\n">
  Error: undefined method `+' for nil:NilClass {:level=>:error}

The new rows do appear if I restart ELK, but the existing rows in my log files gets duplicated in the Kibana search. Please suggest the remedy, such that the inserts in the log file will get automatically reflected in Kibana, without creating duplicate copies.

Also, when I create a new index pattern in Kibana, many times the new fields(that I have added in the grok ) do not appear in the list of fields. So, I cannot use this new index pattern to create a search.

I think the plugin crash you're seeing was fixed in Logstash 1.5.2. See github.com/logstash-plugins/logstash-input-file issue #60. Which one are you running?

Thanks. Upgrading to logstash 1.5.4 solves the issue. Now, I need to
resolve the duplicate entry in the Elasticsearch. When I save a log file,
it adds the new entries, but adds the existing rows too, that gets
displayed in Kibana. It appears somehow the log-file needs to be emptied
after logstash reads it. Please suggest the best practice to get rid
of these duplicate entries.

What's your configuration?

I am using elasticsearch-1.6.0, logstash-1.5.4, kibana-4.1.0-linux-x64 in Ubuntu. The logstash.conf is as below:

input{
	file {
		type => "tranlog"
		path => "/home/rsdpp/git/npci-upi/upi-gateway/gateway-web/logs/upilogs/upi-transaction.log"
		start_position => "beginning"
	}

	file {
		type => "apilog"
		path => "/home/rsdpp/git/npci-upi/upi-gateway/gateway-web/logs/upilogs/upi-api.log"
		start_position => "beginning"
	}
}
filter{
	if [type] == "tranlog" {
		grok{
			match => {
				"message" => '%{NUMBER:siteid} %{IP:client} %{NUMBER:port} \[%{HTTPDATE:timestamp}\] %{DATA:txnid} %{WORD:txntype} %{WORD:resptopsp} %{WORD:bank} %{WORD:respcd} %{WORD:channel} %{NUMBER:imps}'
			}
		}
		date {
			match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
			locale => en
		}
		geoip{
			source => "client"
		}
		
	}else if [type] == "apilog" {
		grok{
			match => {
				"message" => '%{NUMBER:siteid} %{IP:client} %{NUMBER:port} \[%{HTTPDATE:timestamp}\] %{WORD:apiname}'
			}
		}
		date {
			match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
			locale => en
		}
		geoip{
			source => "client"
		}
		
	}

}
output{
	elasticsearch { 
			host => "localhost"
		      }
}

As I save the transaction log file, it adds existing rows, as well as the new rows in the elastic search, an hence duplicate rows are appearing in Kibana. Please suggest how can I avoid the duplicate rows.

Also, my customer wants to suppress the Kibana logo in the dashboard, instead they want to display their own logo. Is there any way to achieve that?

As I save the transaction log file, it adds existing rows, as well as the new rows in the Elasticsearch, an hence duplicate rows are appearing in Kibana. Please suggest how can I avoid the duplicate rows.

What do you mean by "save the transaction log file"? Are you opening it in an editor and modifying it?

I'm pretty sure you can fix your problem by removing start_position => "beginning", but to understand why this helps I need your answer to the question above.

Also, my customer wants to suppress the Kibana logo in the dashboard, instead they want to display their own logo. Is there any way to achieve that?

Not without modifying the source code and rebuilding, I believe.

Thanks Magnus. I was updating the log from editor. When the java application updates the logs, it is not duplicating the rows.

Can you please guide on the steps to replace the logo in Kibana with a custom logo?

I was updating the log from editor. When the java application updates the logs, it is not duplicating the rows.

Right. Your text editor rewrites the file so that it gets a new inode, and with start_position => beginning the file input will read the file from the top.

Can you please guide on the steps to replace the logo in Kibana with a custom logo?

Download the Kibana source code, replace the logo file (or add a new logo file and adjust the logo URL in the HTML template), and rebuild Kibana. I haven't built Kibana myself so I can't be more specific than that.

In the Kibana Discover section, while creating a search, can I filter with a condition like transactionId<>null. I am logging two different types of information; a) API usage, 2) online transaction information. The API usage log entries will not have the transaction id. So, can you please tell me the syntax to filter out the entries where transaction id is not null in Kibana searches?

That's a brand new question about Kibana rather than Logstash, so I suggest you start a new thread in the Kibana category.