[ Issue ] Multiple Rollup Jobs for the same index

Hello fellow Elasticsearch-ers !

With this first post on the forums I'm reaching you for some guidance and knowledge.

I'm currently trying to apply a Rollup - based solution for my indices .

As the reason for our Rollup Job is to lighten ElasticSearch from old data which needs to be there but we don't need it to be accurate-by-the-minute, we were planning to let the Jobs rollup the data and then close old already-rolled indices for as long as we deem them not necessary anymore ( thus deleting them definitely ).

The data is organized by customer and by month.

In order to let the wheel spin, I have created a Rollup Job which should work at the beginning of each month ( "monthly" ), but in order to close old indices right away, we create another Rollup Job called "oneshot" which works in the very first few minutes after it's being created ( I don't like waiting next month for something that can be done right now ).

So ,

A.index_2019.03
A.index_2019.04
A.index_2019.05

get Rolledup by "oneshot" into

A.index_rollup

and each month "monthly" would add documents to

A.index_rollup

Now it's where the strange stuff kicks in :

I've been testing this plan for enough time to make me feel comfortable and start implementing into production, but (obviously) something went wrong.

Some of our customers' indices were not searchable anymore after closing old indices ( the query ARE update with _rollup_search ), while others had no issues at all.

Looking into it, the data between the two customers' indices (regular and rollup) is identical, the jobs are created through a script and can't be any different, the rollup indices are created in the same way and have the same mappings ... anything I could think of gave the same exact results.

Next step was :

  1. add another "oneshot" to the "broken customer" ( calling the job like "oneshot_2", else it would not be created for naming rules of Jobs )
  2. let it run
  3. check

No luck, the index was not fixed and not searchable.

Then I tried :

  1. remove all currently existing Jobs for the "broken customer".
  2. delete the "_rollup" index
  3. create another "oneshot" and let it run.

at this point the data WAS searchable.

  1. create the "monthly" job

and here the issue persisted, the index was not searchable anymore.

The "fun" thing is that the rollup index actually HAS data in it (varying from thousands to some millions of documents ) but the _rollup_search just returns 0 hits

Does anybody have any idea what could cause this, or any idea where I could actually look for it ?

( already compared with
GET A.index_rollup
GET B.index_rollup

GET A.index_rollup/_rollup/data
GET B.index_rollup/_rollup/data

GET xpack/rollup/job/oneshot_A
GET xpack/rollup/job/oneshot_B

GET xpack/rollup/job/monthly_A
GET xpack/rollup/job/monthly_B
)

EDIT : I forgot to specify few things
Currently running a 6.7.1 version on ElasticCloud
The Job templates rollup the data by "interval": "1h"
When I write "create" the Job, I also put the Job in a "started" state, both the "oneshot" and the "monthly".
In the last example I wrote, when the "oneshot" was created and started (and it already ran), the data was searchable. When I created the "monthly", it was "started" but did not run yet ( it's not the beginning of the month ), and the data was not searchable anymore

After further testings I noticed how this happens when the Job Name ( Job ID ) is called.

For example if the Job Name is long, the issues arises.

  1. created and started "oneshot".
  2. index was searchable after "oneshot" ran
  3. created "monthly" with a shorter name than I used to
  4. index was still searchable

Tried anyway all these stuff on a local environment for ElasticSearch in order to detect more logs and/or exceptions that would be harder to understand from ElasticCloud logs, but couldn't find anything particular for the case.

Still trying stuff,

the name was unrelated, was then able to create Jobs with very long names and still didnt break the search.

Next thing that sparkled my imagination was the cronString which could be somewhat troublesome, but still, after trying several "faulty" cronstrings I also ended up with cases with the same cronString not breaking anything ...

Simply running out of ideas...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.