Elasticsearch SQL query - Where not working

Guys, i'm trying to make this query in SQL canvas, if where condition, but my result is like the pic below. The correct result should be the sum between 2017 + 2018 to 2018 and just one bar. Am I doing anything wrong? Can someone help me?

SELECT YEAR(data_de_registro) as d, COUNT(*) as t FROM "table_mercado" WHERE YEAR(data_de_registro)=2018 AND "status.keyword" != 'Removido' GROUP BY d

Guys, I solve it using Histogram in the query. But, I want to know why using just YEAR() function this wasn't working. Can anyone help me?

SELECT HISTOGRAM(year(data_de_registro), 1) as d, COUNT(*) as t FROM "mercado_157_processos" WHERE "data_de_registro" > NOW() - INTERVAL 11 YEAR AND "status_do_processo.keyword" != 'Removido' GROUP BY d

Following should give you one year.
histogram (data_re_registro, interval 1 year)

@JacquelineGrecco can you run the initial query outside Canvas, for example run it in Kibana's Dev Tools, and provide the output here?

@Andrei_Stefan, the result is in json and in a pic. Screenshot%20from%202019-08-28%2016-46-02

{
  "size" : 0,
  "query" : {
    "bool" : {
      "must" : [
        {
          "range" : {
            "data_de_registro" : {
              "from" : "2008-08-28T19:43:47.891Z",
              "to" : null,
              "include_lower" : false,
              "include_upper" : false,
              "boost" : 1.0
            }
          }
        },
        {
          "bool" : {
            "must_not" : [
              {
                "term" : {
                  "status_do_processo.keyword" : {
                    "value" : "Removido",
                    "boost" : 1.0
                  }
                }
              }
            ],
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "119851" : {
              "histogram" : {
                "script" : {
                  "source" : "InternalSqlScriptUtils.dateTimeChrono(InternalSqlScriptUtils.docValue(doc,params.v0), params.v1, params.v2)",
                  "lang" : "painless",
                  "params" : {
                    "v0" : "data_de_registro",
                    "v1" : "Z",
                    "v2" : "MONTH_OF_YEAR"
                  }
                },
                "missing_bucket" : true,
                "value_type" : "long",
                "order" : "asc",
                "interval" : 1.0
              }
            }
          },
          {
            "119854" : {
              "histogram" : {
                "script" : {
                  "source" : "InternalSqlScriptUtils.dateTimeChrono(InternalSqlScriptUtils.docValue(doc,params.v0), params.v1, params.v2)",
                  "lang" : "painless",
                  "params" : {
                    "v0" : "data_de_registro",
                    "v1" : "Z",
                    "v2" : "YEAR"
                  }
                },
                "missing_bucket" : true,
                "value_type" : "long",
                "order" : "asc",
                "interval" : 1.0
              }
            }
          }
        ]
      }
    }
  }
}

@JacquelineGrecco that's not the initial query.
I meant the sql query you posted in your first post: SELECT YEAR(data_de_registro) as d, COUNT(*) as t FROM "table_mercado" WHERE YEAR(data_de_registro)=2018 AND "status.keyword" != 'Removido' GROUP BY d

@Andrei_Stefan, Ah... okay. The result of this query is here.

Screenshot%20from%202019-08-28%2017-56-39

{
  "size" : 0,
  "query" : {
    "bool" : {
      "must" : [
        {
          "script" : {
            "script" : {
              "source" : "InternalSqlScriptUtils.nullSafeFilter(InternalSqlScriptUtils.eq(InternalSqlScriptUtils.dateTimeChrono(InternalSqlScriptUtils.docValue(doc,params.v0), params.v1, params.v2),params.v3))",
              "lang" : "painless",
              "params" : {
                "v0" : "data_de_registro",
                "v1" : "Z",
                "v2" : "YEAR",
                "v3" : 2018
              }
            },
            "boost" : 1.0
          }
        },
        {
          "bool" : {
            "must_not" : [
              {
                "term" : {
                  "status_do_processo.keyword" : {
                    "value" : "Removido",
                    "boost" : 1.0
                  }
                }
              }
            ],
            "adjust_pure_negative" : true,
            "boost" : 1.0
          }
        }
      ],
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 1000,
        "sources" : [
          {
            "122895" : {
              "date_histogram" : {
                "field" : "data_de_registro",
                "missing_bucket" : true,
                "value_type" : "date",
                "order" : "asc",
                "interval" : 31536000000,
                "time_zone" : "Z"
              }
            }
          }
        ]
      }
    }
  }
}

@JacquelineGrecco thank you for providing the results.
This was to confirm the behavior and below I'll try to explain why you see there 2017 even if the condition is for 2018.

When grouping by YEAR() function, ES SQL uses a date_histogram aggregation with a fixed_interval setting with 31536000000ms as its value.

The way it works is, imagine time starting on January 1st, 1970 and start adding to this date the number of milliseconds in a 365 days year - 31536000000ms . And every time you add that number of millis a bucket is created. 49th bucket starts on Dec 20th, 2018 and all the dates from Dec 20th, 2018 until Dec 20th, 2019 fall in the 2018 bucket. Same goes for all the dates from Dec 20th, 2017 to Dec 20th, 2018 - they will fall in the 2017 bucket.

To keep things consistent across all YEAR and dates functionality in ES-SQL, we chose this fixed interval, but there were other reports of strange behavior and we'll look into changing this. The actual fix is to switch from using a 31536000000ms interval to a 1y calendar_interval for YEAR function when using an aggregation on it. The initial report is here https://github.com/elastic/elasticsearch/issues/40162.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.