Index name date math with wildcard?

alerting

#1

When using the index date math, is there a way to be able to add a wildcard into the index name reference?

As some background information, I'm setting up a watch to look for errors in the logs for multiple different dynamic clusters that we index the logs separately, and the index names end up looking like this:

cluster-UUID-1-YYYY.MM.DD cluster-UUID-2-YYYY.MM.DD cluster-UUID-12-YYYY.MM.DD

and so on.

Right now I have the watch set to query indices of name cluster-*, but it seems rather wasteful of resources to send the query for errors to all the indices every 5 minutes when I'm only searching for errors from now-5m to now.

However, when I try to search against a date math index name of <cluster-*-{now/d}>, I get an index not found exception. Unfortunately, adding each cluster UUID in manually (e.g. <cluster-UUID-1-{now/d}>,...) is not really an option as the clusters will be created and destroyed dynamically, which is why we're indexing them separately so that the indices for a destroyed cluster's logs can get removed before they'd age out naturally.

Is there a better way for me to restrict the range of indices queried by watcher so that I'm not sending a query for a 5 minute timestamp range to 30+ days worth of logs for potentially dozens or more clusters?

Thanks in advance.

-Joseph


(Alexander Reelsen) #2

Hey,

just to get your problem right: Does this fail with a regular search as well, or only within a watch? If so, can you share your watch?

--Alex


#3

This does fail in a regular search as well. I was testing the query for the watch against the regular search to validate that I got the results I expected, and I could only get the date math index names to work if I specified the full index name with UUID up to the date portion of the name.

Here's a sanitized version of the query I tested with that got me results:

curl -u readonly -XGET 'https://localhost:9243/<cluster-UUID-12-\{now%2fd\}>/_search?pretty' -d'
    {
        "size" : 0,
        "query" : {
            "bool" : {
                "must" : {
                    "match" : { "level": "WARN" }
                },
                "filter" : {
                    "range": { "@timestamp" : { "gte" : "now-1d" } }
                }
            }
        },
        "aggs" : {
            "environment" : {
                "terms" : {
                    "field" : "environment"
                },
                "aggs" : {
                    "hosts" : {
                        "terms" : {
                            "field" : "host"
                        }
                    }
                }
            }
        }
    }'

And here is a sanitized version of the watch I'm currently using (that works). The only change made after getting it to work as desired was the list of indices:

{
    "trigger" : {
        "schedule" : { "interval" : "5m" }
    },
    "input" : {
        "search" : {
            "request" : {
                "indices" : [ "cluster-*" ],
                "body" : {
                    "size" : 0,
                    "query" : {
                        "bool" : {
                            "must" : {
                                "match" : { "level": "WARN" }
                            },
                            "filter" : {
                                "range": { "@timestamp" : { "gte" : "now-5m" } }
                            }
                        }
                    },
                    "aggs" : {
                        "environment" : {
                            "terms" : {
                                "field" : "environment"
                            },
                            "aggs" : {
                                "hosts" : {
                                    "terms" : {
                                        "field" : "host"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "condition" : {
        "compare" : { "ctx.payload.hits.total" : { "gt" : 0 } }
    },
    "actions" : {
        "notify-slack" : {
            "throttle_period" : "5m",
            "transform"  : {
                "script" : {
                    "id" : "returnHosts",
                    "lang" : "groovy"
                }
            },
            "slack" : {
                "message" : {
                    "to" : [ "#ops-alerts" ] ,
                    "text" : "Warning watcher\nFound WARN events for the following hosts:",
                    "dynamic_attachments" : {
                        "list_path" : "ctx.payload.items",
                        "attachment_template" : {
                            "title" : "Environment: {{env}}",
                            "text" : "{{#hosts}}Host: {{key}} Count: {{doc_count}}\n{{/hosts}}"
                        }
                    }
                }
            }
        }
    }
}

Thank you.

-Joseph


(Alexander Reelsen) #4

Hey,

so the following works for me without watcher:

for i in $(jot 12) ; do curl -XPUT localhost:9200/cluster-$i-2016.10.24  ; done
/usr/bin/curl 'localhost:9200/%3Ccluster-*%7Bnow%2Fd%7D%3E/_search'

and the following works inside a watch

PUT cluster-1-2016.10.24
PUT cluster-2-2016.10.24
PUT cluster-3-2016.10.24
PUT cluster-4-2016.10.24
PUT cluster-5-2016.10.24

PUT /_watcher/watch/cluster_health_watch
{
  "trigger" : {
    "schedule" : { "interval" : "10s" }
  },
  "input" : {
    "http" : {
      "request" : {
       "host" : "localhost",
       "port" : 9200,
       "path" : "/<cluster-*{now%2fd}>/_search"
      }
    }
  },
  "actions" : {
    "log" : {
      "logging" : {
        "text" : "{{ctx.payload}}"
      }
    }
  }
}

Did I miss anything from your example?

--Alex


#5

Thank you Alex.

I tried your example again, and I still get index_not_found_exception. So this may be a difference in versions (I realize now I should have stated which ES version we had in use). We're currently running 2.3.5 in ElasticCloud, is this something that works in 2.4?

-Joseph


(Alexander Reelsen) #6

Can you provide full examples of what does not work and which request (watcher or curl) returns the not found exception?


#7

I upgraded the cluster to 2.4.1 and I still get index not found.

Here is the curl I used (URL obfuscated) with the result (I also tried escaping the * with %2A and got the same error):

curl -u readonly -XGET 'https://XXXXXX.aws.found.io:9243/%3Ccassandra-*%7Bnow%2Fd%7D%3E/_search?pretty' -d'{
    "size" : 0,
    "query" : {
        "bool" : {
            "must" : {
                "match" : { "message": "Exception" }
            },
            "filter" : {
                "range": { "@timestamp" : { "gte" : "now-1d" } }
            }
        }
    },
    "aggs" : {
        "environment" : {
            "terms" : {
                "field" : "environment"
            },
            "aggs" : {
                "hosts" : {
                    "terms" : {
                        "field" : "host"
                    }
                }
            }
        }
    }
}'
Enter host password for user 'readonly':
{
  "error" : {
    "root_cause" : [ {
      "type" : "index_not_found_exception",
      "reason" : "no such index",
      "index" : "[<cassandra-*{now/d}>]"
    } ],
    "type" : "index_not_found_exception",
    "reason" : "no such index",
    "index" : "[<cassandra-*{now/d}>]"
  },
  "status" : 404
}

However, I know there are exceptions in the logs, here is the identical query to all the indices (There's an older index that hasn't aged out yet that didn't have mappings for the "environment" field, so that error is expected until it gets deleted by curator):

curl -u readonly -XGET 'https://XXXXX.aws.found.io:9243/cassandra-*/_search?pretty' -d'{
    "size" : 0,
    "query" : {
        "bool" : {
            "must" : {
                "match" : { "message": "Exception" }
            },
            "filter" : {
                "range": { "@timestamp" : { "gte" : "now-1d" } }
            }
        }
    },
    "aggs" : {
        "environment" : {
            "terms" : {
                "field" : "environment"
            },
            "aggs" : {
                "hosts" : {
                    "terms" : {
                        "field" : "host"
                    }
                }
            }
        }
    }
}'
Enter host password for user 'readonly':
{
  "took" : 81,
  "timed_out" : false,
  "_shards" : {
    "total" : 153,
    "successful" : 110,
    "failed" : 43,
    "failures" : [ {
      "shard" : 0,
      "index" : "cassandra-XXXXX-prod-2016.10.12",
      "node" : "XXXXX",
      "reason" : {
        "type" : "illegal_state_exception",
        "reason" : "Field data loading is forbidden on [environment]"
      }
    } ]
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "environment" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "prod",
        "doc_count" : 3,
        "hosts" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [ {
            "key" : "XXXXX-0001",
            "doc_count" : 3
          } ]
        }
      } ]
    }
  }
}

Thank you for your help.

-Joseph


#8

I did an additional test against a different set of indices just to rule out the UUID with multiple "-"'s in the index name contributing to the problem, but I still get the same errors, so I wonder if this isn't an ElasticCloud issue if you're able to get it to work against a direct instance.

Here's a failed search:

curl -u readonly -XGET 'https://XXXXXX.aws.found.io:9243/%3Csyslog-*%7Bnow%2Fd%7D%3E/_search?pretty' -d'
{
    "size": 0,
    "aggs": {
        "byindex": {
            "terms": {    
                "field": "_index"     
            }                               
        }       
    }               
}'
Enter host password for user 'readonly':
{
  "error" : {
    "root_cause" : [ {
      "type" : "index_not_found_exception",
      "reason" : "no such index",
      "index" : "[<syslog-*{now/d}>]"
    } ],
    "type" : "index_not_found_exception",
    "reason" : "no such index",
    "index" : "[<syslog-*{now/d}>]"
  },
  "status" : 404
}

And successful:

curl -u readonly -XGET 'https://XXXXXX.aws.found.io:9243/%3Csyslog-prod-%7Bnow%2Fd%7D%3E,%3Csyslog-dev-%7Bnow%2Fd%7D%3E/_search?pretty' -d'
> {
>     "size": 0,
>     "aggs": {
>         "byindex": {
>             "terms": {    
>                 "field": "_index"     
>             }                               
>         }       
>     }               
> }'
Enter host password for user 'readonly':
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "hits" : {
    "total" : 3336,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "byindex" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "syslog-dev-2016.10.24",
        "doc_count" : 1946
      }, {
        "key" : "syslog-prod-2016.10.24",
        "doc_count" : 1390
      } ]
    }
  }
}

Again, thanks for your help in looking into this issue thus far.

-Joseph


(Alexander Reelsen) #9

Hey,

As locally is everything working for me, I am currently assuming that this might be an issue with Elastic Cloud in combination with date math indices. I will create a setup myself and test this as soon as I can and ping once I know more!

Thanks for your patience!

--Alex


(Alexander Reelsen) #10

Hey,

turns out there is indeed a problem on cloud, when you are using indices containing an asterisk. The awesome folks at Cloud will take a look at it.

In the meantime I found a minor workaround. Unfortunately you cannot use aliases, as they are resolved when they are created. So you would need to create a new alias every midnight to make this. However, you could make use of the lesser known indices query.

Looks like this

curl localhost:9200/syslog-*/_search -d '{
  "query" : {
    "bool" : {
      "must" : [
        { "match_all" : {} },
        {
            "indices" : {
                "indices" : ["<issue-{now/d}>"],
                "query" : { "match_all" : {} },
                "no_match_query" : "none"
            }
        }        
      ]
    }
  }
}'

This looks a bit confusing at first. What this does is rewrite to rewrite the query on every index that you are hitting, that does not match the current one, to a so called MatchNoneQuery, which does not match any document. However you are still hitting all the possible shards, so this is only a partial optimization.

Hope this helps for now. I will let you know, once I know more!

--Alex


#11

Thank you Alex.

I actually ended up going down the alias route yesterday. I updated the template used to add all the newly created indices that fall under the pattern to a static alias, then setup Curator to remove the older indices from the alias.

I didn't go that route sooner as I didn't have anywhere to run curator from, but got that resolved so I could use it for deleting old data.

I'll keep an eye out for a response from the Cloud team on the issue, as we may have other needs for a wildcard in the date math index reference in the future.

Thanks again for your help in digging to the bottom of the issue.

  • Joseph

(system) #12