I think you are right about the ignore_above. According to its documentation the field will not be indexed at all if its length is greater than the value specified in ignore_above. I was able to run a small test to prove this:
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"message": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 20
}
}
}
}
}
}
}
POST my_index/my_type
{
"message": "a short message"
}
POST my_index/my_type
{
"message": "a very long message that might not be indexed because it is too long"
}
POST my_index/my_type/_search
POST my_index/my_type/_search
{
"size": 0,
"aggs": {
"message": {
"terms": {
"field": "message.raw"
}
}
}
}
So it seems like your solution would be to increase the value of ignore_above in the Logstash Elasticsearch template OR perhaps even remove that option altogether. Then, any new messages you index using Logstash should be indexed completely as raw fields. For messages you have already indexed, you will need to reindex them.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.