I am doing a school project where I am using your product for a log management system.
I have lots of data and I want to know which log lines are duplicate and how many duplicates there are for that particular log line.
I tried this query in which I succesfully extracted the duplicate numbers.
But it doesn't work for a log line that contains more characters for some weird reason. I tried the very same query only with log_level : "CRITICAL". In that way I'll get other log lines from the CRITICAL level but somehow the bucket is empty.
I hope someone can help me with this weird problem.
Well I have a feeling it doesn't work when the log line has more than x characters for example this log line: """Uncaught PHP Exception ErrorException: "Warning: include(/data/httpd/api/xxx/var/cache/dev/overblog/graphql-bundle/__definitions__/QueryType.php): failed to open stream: No such file or directory" at /data/httpd/api/xxx/vendor/composer/ClassLoader.php line 444"""
I have multiple log lines who are exactly like this but for some reason the query mentioned above doesn't give me a bucket
Can it be that the .keyword messes it up? Or is my query incorrect?
You are absolutely right - by default a .keyword field will only contain values up to 256 characters. You can see that by looking at your index' mappings:
GET my_index/_mapping
You will see that the .keyword fields in the mapping have an ignore_above parameter with a value of 256.
You can change the value of ignore_above. You would typically do that when creating the index, by providing an explicit mapping. You can also change it dynamically on existing indexes but be aware that this is quite an expensive operation as it requires Elasticsearch to rewrite all the data.
To update existing indexes, first update the mapping:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.