Terms Aggregation lost result


(Feng Yu (Abcfy2)) #1

Hi all.

I want to aggregate the log of postgresql slow query. Here is my log in kibana:

You can see there are 13 hits.

I want to execute as this SQL:

SELECT slow_sql.raw, avg(duration), count(1) FROM 'logstash-*' GROUP BY slow_sql.raw;

Here is my table:

Only one result. What's wrong with my operation?


(Shaunak Kashyap) #2

Everything looks right at first glance. Can you paste the entire output of the Elasticsearch response body here please? Thanks!


(Feng Yu (Abcfy2)) #3

Request:

{
  "size": 0,
  "aggs": {
    "2": {
      "terms": {
        "field": "slow_sql.raw",
        "size": 5,
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "3": {
          "avg": {
            "field": "duration"
          }
        }
      }
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}
    },
    "require_field_match": false,
    "fragment_size": 2147483647
  },
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "*",
          "analyze_wildcard": true
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "query": {
                "match": {
                  "type": {
                    "query": "postgresql",
                    "type": "phrase"
                  }
                }
              },
              "$state": {
                "store": "appState"
              }
            },
            {
              "query": {
                "match": {
                  "tags": {
                    "query": "slow_query",
                    "type": "phrase"
                  }
                }
              },
              "$state": {
                "store": "appState"
              }
            },
            {
              "range": {
                "@timestamp": {
                  "gte": 1470286624761,
                  "lte": 1470287524761,
                  "format": "epoch_millis"
                }
              }
            }
          ],
          "must_not": []
        }
      }
    }
  }
}

Response:

{
  "took": 1039,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 14,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "2": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "3": {
            "value": 566.851
          },
          "key": "statement: BEGIN;select distinct(data_id) from sym_data_event where data_id >=102711 and data_id <= 50102710 order by data_id asc ",
          "doc_count": 1
        },
        {
          "3": {
            "value": 1320.955
          },
          "key": "statement: SELECT DISTINCT(equipment_id) FROM moni_equipment_item;",
          "doc_count": 1
        }
      ]
    }
  }
}

(Feng Yu (Abcfy2)) #4

It seems that the slow_sql.raw is too long. I find the logstash default es template contains this config:

        "string_fields" : {
          "match" : "*",
          "match_mapping_type" : "string",
          "mapping" : {
            "type" : "string", "index" : "analyzed", "omit_norms" : true,
            "fielddata" : { "format" : "disabled" },
            "fields" : {
              "raw" : {"type": "string", "index" : "not_analyzed", "ignore_above" : 256}
            }
          }
        }
      }

Maybe if the slow_sql.raw is more than 256 bytes will be ignored by GROUP BY?


(Shaunak Kashyap) #5

Hi,

I think you are right about the ignore_above. According to its documentation the field will not be indexed at all if its length is greater than the value specified in ignore_above. I was able to run a small test to prove this:

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "message": {
          "type": "text",
          "fields": {
            "raw": {
              "type": "keyword",
              "ignore_above": 20
            }
          }
        }
      }
    }
  }
}

POST my_index/my_type
{
  "message": "a short message"
}

POST my_index/my_type
{
  "message": "a very long message that might not be indexed because it is too long"
}

POST my_index/my_type/_search

POST my_index/my_type/_search
{
  "size": 0,
  "aggs": {
    "message": {
      "terms": {
        "field": "message.raw"
      }
    }
  }
}

So it seems like your solution would be to increase the value of ignore_above in the Logstash Elasticsearch template OR perhaps even remove that option altogether. Then, any new messages you index using Logstash should be indexed completely as raw fields. For messages you have already indexed, you will need to reindex them.


(system) #6