How to efficiently query date *and* type based indexes


(John T) #1

Our environment has one large monolythic index that is 483 million docs with a size of 1TB. This is currently spread over 23 shards. As it grows, we keep needing to scale out by reindexing into more shards. In this index, we have 23 types. Some of these types have only marginal overlap in common fields. We are currently on 2.3.2 and want to be able to upgrade, but ES is phasing out multiple types per index. Also, we would like to be able to break our data down by months to make querying and data-retention easier. To that end, I am looking at splitting up the index by both type and month. We would end up with, for example:

foo_type1_2017.10
foo_type1_2017.09
foo_type2_2017.10
foo_type2_2017.09

And so forth.

Note that this is not like logging data where we have an "open window" index to write to. Any document from any time can get updated. We can handle that in our indexing code.

Our UX has filters that allow you to specify (among many other things) a date range and types. I am realizing that if I was dealing with just one of these (i.e. type or date), it would be simple, but the combined scenario is what I am trying to wrap my head around.

For example, for types "abc," "aab", "cba" and "aaa," I could query with:

/foo_aab,foo_aaa/_search

Or

/foo_aa*/_search

I could also create an alias that points to these indexes and search:

/double_a/_search

For dates, if I want all items in September and October of this year, I could do:

/foo_2017.09,foo_2017.10/_search

Or again, I could create an alias (which I would keep updated) that points to the last 2 months:

/foo_2m/_search

The problem is if I want to combine them. If I wanted a subset of types, for a specific date range, I would have to explicitly list out all the indexes without the aid of aliases. So for three types over two months, it would need to be

/foo_type1_2017.10,foo_type1_2017.09,foo_type2_2017.10,foo_type2_2017.09,foo_type3_2017.10,foo_type3_2017.09/_search

This easily gets unwieldy (and will probably exeed the character size allowed for a request.

I had hoped that date math in the index name would solve part of it (i.e. hoping that you could do a gte/lte logic), but it appears you can only use that to resolve a specific date.

Is there a good way to do this or are we crazy to take this approach?

Thanks


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.