The fields that I want in the elastic index are below:
client_ip => type must be compatible to what kibana uses for IP mapping.
timestamp => datetime format. => the time the of the log
method => text => the method that was called e.g. GET,POST
version => decimal number => e.g. 1.2 / 1.0 (in the sample logs as v1.2)
username => text => it's the text after the username= (in the sample log as pradeep.pgu)
location =>geo_point type => the value has both latitude and longitude so that kibana can plot these on the map.
search_query => text => the thing that was searched (in the sample from either of the two fields "keyword=" or "query="). Either of the two fields would be present and the one that is present, it's value must be used.
response_code => number => the code of the response. (in the sample as 200)
data_transfered => number => the amount of data transferred (the last number in the sample).
Now I am using a log4j pattern to separate out the query and it's parts:
Now when I try to pass in a sample log to the Grok Debugger, it gives me, no results. Where am I going wrong?
I Have 2 questions:
Where am I going wrong with the pattern?
How do I use the KV filter chained to a grok filter to pull out the username and query and location to push into the elastic index with the desired data types and field values.
What you call "log4j pattern" (I don't understand how it's related to Log4j") looks fine for use in a grok filter. The output of Pattern Translator is bogus.
How do I use the KV filter chained to a grok filter to pull out the username and query and location to push into the elastic index with the desired data types and field values.
When your grok filter works and you have a request field that contains the query string configuring the kv filter should be quite straight forward. You can use the include_keys option to choose what to extract (username, query, and location).
Hi, Thanks for the answer, if it's possible would you be able to show a quick implementation of the filter section of the logstash config that maintains the data types as well as gets the data ready to be pushed into elastic, with the points I've mentioned in the question especially the part where if query parameter is present then search_string field in elastic has the data of the query parameter and if the keyword parameter is present then the search_string has the keyword parameters value. I did try to do the filters before, but it pushed all the fields as a text data type. Would be able to help me out with a sample? It'll be of great help
I don't have time for exact configuration write-ups but I can answer specific questions, like where you show your configuration, what an example event looks like right now, and what you want it to look like instead.
Very well. If that is the case, can you please help me in knowing if the below filter code perfect for what I want to achieve as I have mentioned in my question.:
The issue I am facing is that, when I use the above provided filter, all the data is pushed exactly the way it should and is expected. The only issue is, the datatypes of all are string. even for the number types or client ip or geo_point types. That's what I am trying to figure out. Could you please help.
Please note: the formatting might be off due to the formatter.
The index mappings put in a text file and uploaded here as the editor wasn't allowing me to add more lines
You can observe that every field is a text. where as I want the response code, the data transferred and the version to be numbers and location to be geo_point so that I can use a kibana map visualization and client_ip is the type ip so that I can use kibana to show the unique requests
To get geo_point fields you need to adjust the index template. Logstash's default template applies to indexes with the default name will work out of the box (depending on what name you give your geo_point fields), but you changed to a custom index name.
Making sure fields are mapped as numbers can also be done in the index template, but you should also convert your fields from strings to numbers as necessary. You can use a mutate filter for this but you can also use the %{PATTERN:field:type} notation in your grok expression.
Here's the relevant part of the grok filter documentation:
Optionally you can add a data type conversion to your grok pattern. By default all semantics are saved as strings. If you wish to convert a semantic’s data type, for example change a string to an integer then suffix it with the target data type. For example %{NUMBER:num:int} which converts the num semantic from a string to an integer. Currently the only supported conversions are int and float.
If you're having problems with a particular configuration we need to know what configuration you tried. Example input is also useful but you've posted that earlier in the thread.
Ok looking at the entire thread , it looks to me that you have got the parsing of fields correct except the data types of the integer fields and geoip locations.
As @magnusbaeck also mentioned it, which I don't know whether you looked at or not is that to use the mutate and geo ip filters to convert text fields to integer and geo locations.
Second step you need to do is to modify your index template where the data type of the integer fields should be modified from text to integer which also was mentioned by @magnusbaeck
I don't know whether you looked at it or not, if you have then you already have the solution to your problem.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.