Environment: 4 nodes, 124GB RAM total, ~.5TB of data, v5.6.8
Indices: 1 main index, 1 parent doc type, 3 child doc types, ~300 index operations/s on the primary shards
Clients: Primarily the Java SDK
Our core index has a parent doc type (elements) and a secondary core type (metrics) which is a child document (linked by a primary key). We have other child doc types, but one is enough to illustrate the issue. Metrics are also embedded docs to elements, but they are duplicated to their own doc type so we can query and fetch metrics matching criteria, not just elements with metrics matching criteria.
We've found that metrics in the child doc index go missing. I've done a lot of research into this to try to replicate. The two scenarios I've tried to replicate are:
- Are we getting bulk index failures when saving metrics? I log them and can't find a single one. I can replicate errors in a test environment, but that's through maxing out the bulk index thread pool.
- Are metrics getting indexed on a primary shard, not yet replicated to secondary shards, then the primary fails over? I've tried replicating this as well with a small Dockerized cluster without success.
I've got two main lines of questions:
- Does anyone have any advice on other things to try? Have you seen a scenario like this before? Is the Java SDK lying to me about bulk index errors? Is nothing possibly wrong with Elasticsearch and I need to dig into my app code more?
- I understand duplicating data like this is not great. I don't like it. Do you know of a better way to model the data so I can pull back child documents from queries, but only storing them as embedded documents?
Thanks in advance, I'm happy to provide more detail if needed.