Change data from database

Hi ya,

I'm storing timestamped change data coming off a SQL database into Elasticsearch and would like some advice. It is over 100 tables and most will be fairly small (small number of bytes + not many changes over course of the day) with a few tables being much larger and more active.

I'm curious if you all would recommend one index per table with an alias or one mono-index.

Some points:

  • Every table will have similar context information (timestamp, user id, org id, etc)
  • Queries will usually be over all tables by timestamp. I'll likely be sorting by timestamp and aggregating a lot across tables.
  • I'll like be rolling over the index/indices every week or month.
  • There will be about 1000 columns (so 1000 fields about total).
  • Very rarely add columns (and therefore fields).
  • Quite a few of the tables are small enough that they wouldn't even necessarily need their own shard space-wise(quite a few would be less than 10GB per month).

I'll be writing into it with bulk action, from one process atm. I'm mostly focused on query speed.

I realize there isn't always a straightforward answer here, but I'd love any thoughts you could provide.

Thanks much,
Patrick

I would just use one index for them all, and then look at attaching a field+value that identifies the originating table so that you can filter on that if you want.

Awesome thanks @warkolm , that was the direction I was leaning.

Is 1000 fields within one index an issue? I know having "lots" of fields can be tough on a cluster's master nodes, but never heard what "lots" really means.

Thanks,
Patrick

Check out https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings for more on that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.