Terms Aggregation lost result

abcfy2 · August 4, 2016, 2:28am

Hi all.

I want to aggregate the log of postgresql slow query. Here is my log in kibana:

You can see there are 13 hits.

I want to execute as this SQL:

SELECT slow_sql.raw, avg(duration), count(1) FROM 'logstash-*' GROUP BY slow_sql.raw;

Here is my table:

Only one result. What's wrong with my operation?

shaunak · August 4, 2016, 4:22am

Everything looks right at first glance. Can you paste the entire output of the Elasticsearch response body here please? Thanks!

abcfy2 · August 4, 2016, 5:13am

Request:

{
  "size": 0,
  "aggs": {
    "2": {
      "terms": {
        "field": "slow_sql.raw",
        "size": 5,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "3": {
          "avg": {
            "field": "duration"
          }
        }
      }
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}
    },
    "require_field_match": false,
    "fragment_size": 2147483647
  },
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "*",
          "analyze_wildcard": true
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "query": {
                "match": {
                  "type": {
                    "query": "postgresql",
                    "type": "phrase"
                  }
                }
              },
              "$state": {
                "store": "appState"
              }
            },
            {
              "query": {
                "match": {
                  "tags": {
                    "query": "slow_query",
                    "type": "phrase"
                  }
                }
              },
              "$state": {
                "store": "appState"
              }
            },
            {
              "range": {
                "@timestamp": {
                  "gte": 1470286624761,
                  "lte": 1470287524761,
                  "format": "epoch_millis"
                }
              }
            }
          ],
          "must_not": []
        }
      }
    }
  }
}

Response:

{
  "took": 1039,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 14,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "2": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "3": {
            "value": 566.851
          },
          "key": "statement: BEGIN;select distinct(data_id) from sym_data_event where data_id >=102711 and data_id <= 50102710 order by data_id asc ",
          "doc_count": 1
        },
        {
          "3": {
            "value": 1320.955
          },
          "key": "statement: SELECT DISTINCT(equipment_id) FROM moni_equipment_item;",
          "doc_count": 1
        }
      ]
    }
  }
}

abcfy2 · August 4, 2016, 8:13am

It seems that the slow_sql.raw is too long. I find the logstash default es template contains this config:

        "string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "string", "index" : "analyzed", "omit_norms" : true,
            "fielddata" : { "format" : "disabled" },
            "fields" : {
              "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
            }
          }
        }
      }

Maybe if the slow_sql.raw is more than 256 bytes will be ignored by GROUP BY?

shaunak · August 9, 2016, 5:27pm

Hi,

I think you are right about the ignore_above. According to its documentation the field will not be indexed at all if its length is greater than the value specified in ignore_above. I was able to run a small test to prove this:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "text",
          "fields": {
            "raw": {
              "type": "keyword",
              "ignore_above": 20
            }
          }
        }
      }
    }
  }
}

POST my_index/my_type
{
  "message": "a short message"
}

POST my_index/my_type
{
  "message": "a very long message that might not be indexed because it is too long"
}

POST my_index/my_type/_search

POST my_index/my_type/_search
{
  "size": 0,
  "aggs": {
    "message": {
      "terms": {
        "field": "message.raw"
      }
    }
  }
}

So it seems like your solution would be to increase the value of ignore_above in the Logstash Elasticsearch template OR perhaps even remove that option altogether. Then, any new messages you index using Logstash should be indexed completely as raw fields. For messages you have already indexed, you will need to reindex them.

Topic		Replies	Views
Help with aggregation query Elasticsearch	5	421	November 20, 2019
Kibana Keyword compare and collate based on the unique query list in string format Kibana	2	357	November 30, 2020
Indexing Aggregate Results Logstash	1	231	November 4, 2019
Help in visualizing custom query/aggregation Kibana	8	647	February 22, 2021
How to display only TOP 10 of aggregate count in kibana? Kibana	2	2006	June 21, 2019

Terms Aggregation lost result

Related topics