Distinct Values for Phrases Help

Hello, I am indexing a document that has error message as a field. The error message of course could be a phrase. I am using term aggregation to determine the unique error messages that have occurred but the aggregation is returning unique terms within the field. For example, for 'system error', I am getting 'system' as a term and 'error' as a term. How can I get elasticsearch to return phrases as unique terms?

Thanks in advance for your help.

Look at shingles. They can combine multiple tokens into a single token. You could combine this with a keepwords list if you only wanted to hunt for specific phrases and ignore all others. You probably would want to copy the text to a different field to get this separate analysis run but preserve regular, full-text search over the original field values.

What I am trying to do is find all unique error messages with number of times they have occurred on a given day. I am not sure if shingles would help.

Thanks.

Do you just want to not tokenize? You can simply mark the field as not analyzed. Are you trying to extract something in the field (ie parse the error messages out) then you'll probably need to do that outside of Elasticsearch.

Hope that helps,

I ended up doing this outside of elasticsearch. I am retrieving all the error messages and going through the retrieved values to get the unique values.