Hello, I am indexing a document that has error message as a field. The error message of course could be a phrase. I am using term aggregation to determine the unique error messages that have occurred but the aggregation is returning unique terms within the field. For example, for 'system error', I am getting 'system' as a term and 'error' as a term. How can I get elasticsearch to return phrases as unique terms?
Look at shingles. They can combine multiple tokens into a single token. You could combine this with a keepwords list if you only wanted to hunt for specific phrases and ignore all others. You probably would want to copy the text to a different field to get this separate analysis run but preserve regular, full-text search over the original field values.
Do you just want to not tokenize? You can simply mark the field as not analyzed. Are you trying to extract something in the field (ie parse the error messages out) then you'll probably need to do that outside of Elasticsearch.
I ended up doing this outside of elasticsearch. I am retrieving all the error messages and going through the retrieved values to get the unique values.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.