Filtering by date and time as different fields


(Gustav Gonzalez) #1

Hi!

I am trying to create a filter for my query using a date range and a time range. In my case, date and time are different fields. So I want to run queries like: Show me all the events happening this month (15-11-01/15-11-30) between 10:00:00 am and 12:00:00

This is my query, but it is not working. Note: If I only use the "date part" it works, but with the combination of the time field, the results are wrong.

Any suggestion?

{
   "query":{
      "filtered":{
         "query":{
            "match":{
               "message":"error"
            }
         },
         "filter":{
            "bool":{
               "must":{
                  "range":{
                     "date":{
                        "gte":"15-11-01",
                        "lte":"15-11-30"
                     }
                  }
               },
               "must":{
                  "range":{
                     "time":{
                        "gte":"10:00:00",
                        "lte":"12:00:00"
                     }
                  }
               }
            }
         }
      }
   }
}

(Zachary Tong) #2

Two potential issues:

  • First, your bool syntax is a bit wrong. Both of the range filters should go inside a single must clause, instead of two different must clauses. Try this:
{
   "query":{
      "filtered":{
         "query":{
            "match":{
               "message":"error"
            }
         },
         "filter":{
            "bool":{
               "must":[
                  {
                     "range":{
                        "date":{
                           "gte":"15-11-01",
                           "lte":"15-11-30"
                        }
                     }
                  },
                  {
                     "range":{
                        "time":{
                           "gte":"10:00:00",
                           "lte":"12:00:00"
                        }
                     }
                  }
               ]
            }
         }
      }
   }
}
  • Second, are your "date" and "time" fields actual Date types? If they are just strings, ES is doing an alphanumerical range, which isn't identical to a real date-time range. Also, if these are string fields that have been analyzed, you're doing alphanumerical ranges on small fragments of your date (e.g. ["15", "11", "30"] rather than 15-11-30) because the analysis likely tokenized and split on special characters, like hypens and colons.

I'd recommend rolling these together into a single Date type and doing the complete range on that. If you can't do that, I'd use a Date for the "date" field and a not_analyzed string for the "time" field, then ensure all times that you request never "wrap around" since that won't work properly with the alphanumeric range.


(Gustav Gonzalez) #3

Thank you for your answer. Following your suggestion I have a new question:

How should I write the range filter for this query: Get all the events happening this month (15-11-01/15-11-30) but only between 10:00:00 am and 12:00:00 of every day?
Understanding that I only would have one field (of DATE type) containing date and time together

I appreciate any hint :smile:


(Zachary Tong) #4

Ahh, right, missed that part. :slight_smile: Also, I forgot that ES does support just times, you just have to specify the format in your mappings. Try something like this:

PUT /test/
{
   "mappings": {
      "test": {
         "properties": {
            "date": {
               "type": "date",
               "format": "yyyy-MM-dd"
            },
            "time": {
               "type": "date",
               "format": "HH:mm:ss"
            }
         }
      }
   }
}
PUT /test/test/include1
{
    "date": "2015-11-04",
    "time": "10:00:00"
}

PUT /test/test/include2
{
    "date": "2015-11-04",
    "time": "11:00:00"
}

PUT /test/test/exclude1
{
    "date": "2015-11-04",
    "time": "12:00:00"
}

PUT /test/test/exclude2
{
    "date": "2015-08-04",
    "time": "10:00:00"
}

And then following query will only find include1 and include2:

GET /test/_search
{
   "query": {
      "filtered": {
         "filter": {
            "bool": {
               "must": [
                  {
                     "range": {
                        "date": {
                           "gte": "2015-11-01",
                           "lte": "2015-12-01"
                        }
                     }
                  },
                  {
                     "range": {
                        "time": {
                           "gte": "10:00:00",
                           "lte": "11:00:00"
                        }
                     }
                  }
               ]
            }
         }
      }
   }
}

(Gustav Gonzalez) #5

That would be great if I just could define the map of my documents in ES, but this is my situation:
I have a logstash service with this filter:

match => { "message" => "%{DATE:date} %{TIME:time} %{GREEDYDATA:message}" }

It is working and sending all the data to the ES service which is running in the same host. All the indexes are automatically created, one per day, following this map structure:

"date":{"type":"date","format":"dateOptionalTime"}, "time":{"type":"string"}, message":{"type":"string"}

In every record, ES recognized patterns like this "00:00:00" as strings, that makes impossible to run the filter in the way you suggested because the range condition (gte/lte) can't be applied to strings.

Is there any way to ensure the time field as a TIME variable for real from the logstash filter for every new ES index? or do you have any other solution in mind?


(Gustav Gonzalez) #6

Finally I could join date and time fields using this type in my logstash filter:

%{TIMESTAMP_ISO8601:date}

Now, I can make queries to records like this:

2015-11-04T22:35:01

But even in this new scenario, I can not resolve my initial problem:
How to get all the events happening this month (15-11-01/15-11-30) but only between 10:00:00 am and 12:00:00 of every day?

Note: My indexes are generated dynamically, so I can't edit their structure manually. Nevertheless, I can change the logstash filter settings if it can help. I was hoping to make the trick just using ES query syntax.


(Zachary Tong) #7

Yeah, you'll have to tell Logstash to expect a "date" and a "time" with the appropriate mappings. E.g. edit your elasticsearch-template.json to include the fields explicitly.

Otherwise, I'm not sure there is a good way to accomplish this directly with the query DSL (currently). Every other solution I could think of was a hack that would only sorta work, like using strings instead of real times.

Perhaps open a ticket in the ES repo, requesting that you can query dates by just the time portion? I think that would be possible without serious technical problems, but I'm not quite sure if there is something preventing that.


(Gustav Gonzalez) #8

I found this solution around other forum, it works for hours but not for minutes. Too extreme?

{
  "query": {
   "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "date": {
                  "gte": "2015-11-01",
                  "lte": "2015-11-30"
                }
              }
            },
            {
              "script": {
                "script": "doc.date.date.getHourOfDay() >= min && doc.date.date.getHourOfDay() <= max",
                "params": {
                  "min": 8,
                  "max": 10
                }
              }
            }
          ]
        }
      }
}

}
}


(Zachary Tong) #9

That'd work...but definitely not ideal :smile:

The problem with using a script in your search is that it scales linearly to the number of documents that need to be evaluated. So if you are matching 1bn docs...it has to run that script 1bn times. And scripts are a fair amount slower than native functionality, since it has to boot up a Groovy interpreter for each execution.

Sooo...if the slowness is acceptable to you, go for it. Your data may be small enough that the overhead of the script is minimal. But if it becomes too slow, I'd encourage you to "fix" the issue at the mapping level for your new indices (Elasticsearch will automatically apply the new template to newly created indices, so your data will be "fixed" starting that point forward)


(Gustav Gonzalez) #10

How and where can I define a template to apply for every (dynamically created) ES index that match this name structure: syslog-yyyy-mm-dd ?
That's the point where I get lost, because it seems the logstash filter config file is not the place to handle the way ES defines its map structure.

When I do this from logstash:

grok {
      match => { "message" => "%{DATE:date} %{TIME:time} %{DATA:message}" }
}

I always get the "time" field as string in the ES mapping, so I can not use the time filter :unamused:


(system) #11