Curator throws exception when trying to delete indices based on timestamp


(Kay B) #1

I have an actions yml file that tries to delete ES indices that are more than 5 minutes old. My ES docs contain a field called timestamp which is used to identify the records. But when curator runs, I get an exception

"Failed to complete action: delete_indices. <type 'exceptions.TypeError'>: int() arguent must be a string or a number, not 'NoneType'

Here is how my actions look like

actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 5 minutes.
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: age
      source: field_stats
      field: '@timestamp'
      stats_result: min_value 
      direction: older
      unit: minutes
      unit_count: 5

Any idea what I might be doing wrong?

Thanks
Kay


(Aaron Mildenstein) #2

Without seeing more context from the logs, I can't be sure. More information would be helpful. But seeing your filters, you are only filtering by age, and that means that all of your indices are being parsed for that @timestamp field. This includes indices like .kibana and .security and .watch.... I am guessing that the filter is not able to see an @timestamp field in indices like that, and that is resulting in your error. You should use a pattern filter before the age filter to ensure that only indices which contain that field are selected for performing a field_stats operation. No, Curator doesn't automatically do this for you:

The value of this setting must be a timestamp field name. This field must be present in the indices being filtered or an exception will be raised, and execution will halt.


(Kay B) #3

Appreciate the quick response. Your reply makes sense. So i changed my actions to this

filters:
    -filtertype: pattern
      kind: prefix
      value: myindex
      exclude:
    - filtertype: age
      source: field_stats
      field: '@timestamp'
      stats_result: min_value 
      direction: older
      unit: minutes
      unit_count: 5

Now curator is unable to match any documents, which makes me question my understanding of filters. Can I interpret those filters as "Find all docs whose index name starts with 'myindex' and if its timestamp field is older than 5 minutes, delete it"

A different question, do i have to say '@timestamp' in field or can i leave the @ part?

Thanks
K


(Aaron Mildenstein) #4

Curator doesn't match documents. It matches indices. In the case of the field_stats action, it matches indices which have documents with fields having timestamp values meeting the specified criteria. It sounds like you were expecting Curator do delete documents but that is not how it works. Curator doesn't delete documents—it deletes whole indices.

As an explanation, the pattern filter you specified will look for indices which start with myindex (prefix). Do you have indices that start with that? e.g. myindex-1, myindex-2017.12.26, myindex-something-here?


(Kay B) #5

Thanks for the clarification. Yes i do have myindex as the index prefix for all my documents. After i replaced field: '@timestamp' with field: timestamp, my indices are getting deleted as expected.

WRT to deleting the actual documents (I understand that curator does not do this) what would be a good approach? In my situation, I get documents every five minutes all day, I would like to purge (delete) all the documents and indices within elasticsearch at the end of the day so that I can keep my memory usage under control.

Thanks again


(Aaron Mildenstein) #6

If you are having a constant stream of documents coming in, you should not be deleting documents, but rather use time-series indices — they could be hourly, for example. You would then use Curator to delete indices when they are no longer needed.


(Kay B) #7

Currently, all my documents share the same index. And I use time timestamp field to match what I want to delete. Your suggestion would mean that all the documents under a given hour would have the same index and when I delete the index based on age, what happens to the documents under that index? do they get purged automatically?

Thanks
K


(Aaron Mildenstein) #8

If an index gets deleted, all documents in the index go with it. It's semi-analogous to a SQL drop table vs. doing a SQL delete clause. Dropping a table (or index, as is the case for Elasticsearch) is always going to be more performant than doing a bunch of atomic deletes.


(andy_zhou) #9

as a time .
this output no indices find.


(Kim Kruse Hansen) #10

Hi

I have also had this message running curator. The reason for me was that index was empty. Is that intended behaviour?

I would like to recommend that it would be ignored in case of empty index in order not to stop the pipeline.

Rest regards
Kim


(Aaron Mildenstein) #11

Yes it is, and it is explicitly documented:

This field must be present in the indices being filtered or an exception will be raised, and execution will halt.

Why do you have empty indices around eating up valuable JVM resources?


(Tim Arp) #12

Hi,
I followed this thread because I have the same issue. For whatever reason, I may have an index where @timestamp does not exist. I think if curator doesn't find it, it should ignore that index.

I'm transitioning my indexing to active indices using aliases. I have pre-setup the initial empty index and alias for the development teams. When they move there code way from index-yyyy-mm-dd to using the active alias they will write docs with @timestamp. But for a few days the index will be empty.

Also, I may have a development team that writes their time with 'timestamp', Until I figure out they are doing this wrong, then the curator job will fail when it hits that index.

One more question, in version 6, field_stats is deprecated, what is the alternative for curator?

thanks,
Tim


(Aaron Mildenstein) #13

Curator is calling it the same thing, but it now uses a query to accomplish the same thing.

It is non-trivial to add this functionality to Curator. But I'm getting a lot of requests for it. It may make it into 6.0. You'll have to wait and see.


(Aaron Mildenstein) #14

You all understand the danger inherent in sending indices that won't match a query for a timestamp to a tool that will be deleting indices based on timestamps it finds? This is not a recommended practice, to have the tool ignore something that shouldn't be happening.


(Tim Arp) #15

I'm not really understanding the danger part. I have some old ES clusters at AWS that I had to write my own delete in this way. Curator doesn't work with the early 5.x ES instances at AWS. Anyway, I would iterate through the list and query each index. If it passed my "check the @timestamp" , it got added to the delete list. If it got no results, that was the same as failing the check.
In a perfect world ES could enforce this so we couldn't have a stray index throw off this job. But the way it is, ES is more than happy to let you not configure @timestamp or call it 'timestamp'.


(Aaron Mildenstein) #16

ES is more than happy to let you not configure @timestamp or call it 'timestamp'

Well, yes. It's meant to be configurable. So is the field directive for Curator, so you can pick whatever timestamp field you've chosen. Consistency in naming is something the end user has to enforce, rather then Elasticsearch.


(Aaron Mildenstein) #17

And this is where things broke for Curator. In <5.4 versions, it used the field_stats API, which allowed the queries to be in bulk form, rather than needing to iterate over each index. Since iteration is what Elasticsearch mandated as the replacement for field_stats, Curator switched to doing that in 5.4. Adding the code to remove them from the actionable list is easiest than for field_stats, but still not something I'm eager to dive into. I'm trying to write big changes for 6.0, not handle edge cases for 5.x still.


(Kim Kruse Hansen) #18

Did you consider the possibility that a rollover has just occurred seconds ago and no data has been pushed to the new index yet ? And then the pipeline is halted because of no data yet?

There is no valuable JVM resources being eaten up.

Regards
Kim


(system) #19

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.