Change data from database

Hi ya,

I'm storing timestamped change data coming off a SQL database into Elasticsearch and would like some advice. It is over 100 tables and most will be fairly small (small number of bytes + not many changes over course of the day) with a few tables being much larger and more active.

I'm curious if you all would recommend one index per table with an alias or one mono-index.

Some points:

  • Every table will have similar context information (timestamp, user id, org id, etc)
  • Queries will usually be over all tables by timestamp. I'll likely be sorting by timestamp and aggregating a lot across tables.
  • I'll like be rolling over the index/indices every week or month.
  • There will be about 1000 columns (so 1000 fields about total).
  • Very rarely add columns (and therefore fields).
  • Quite a few of the tables are small enough that they wouldn't even necessarily need their own shard space-wise(quite a few would be less than 10GB per month).

I'll be writing into it with bulk action, from one process atm. I'm mostly focused on query speed.

I realize there isn't always a straightforward answer here, but I'd love any thoughts you could provide.

Thanks much,

I would just use one index for them all, and then look at attaching a field+value that identifies the originating table so that you can filter on that if you want.

Awesome thanks @warkolm , that was the direction I was leaning.

Is 1000 fields within one index an issue? I know having "lots" of fields can be tough on a cluster's master nodes, but never heard what "lots" really means.


Check out for more on that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.