# Migrating off Elasticsearch as sole primary database for a relational business domain — anyone done this?
I've inherited a ~6 year old production Django application where Elasticsearch is the only data store. Not "ES for search, Postgres for data" — ES *is* the database. ~20 document types, ~40GB of application data, serving a business domain that is deeply, deeply relational (think: organizations have members, members have groups of members, groups of members have an elected leader, each group of members has a point of contact with the company, each group meets monthly, when they meet monthly they meet with a teacher, invoices reference missions, etc.).
The application has been in production for several years with real users and real data. I'm an interim CTO now responsible for figuring out what to do with the platform, despite having never used Elasticsearch.
I want to share what I've found, explain the strategy I'm pursuing, and ask whether anyone has been in a similar situation.
## What we're dealing with
First, let me say that 95% of the queries in the code are basic lookups by document id or by single-field search. There's one 'full text' search bar in the app, it's not a massively used feature, and I'm 99% sure I can accomplish all search cases with some basic Postgres full-text search techniques.
There's a number of indices which are being (ab)used as equivalents to collections/tables, i.e. one data/document type per index.
### Referential integrity
ES has no foreign keys, no unique constraints, no cross-document transactions. The application stores inter-entity relationships as string IDs embedded in documents, with zero enforcement at the database level. If entity A references entity B by ID, and entity B is deleted, nothing stops that. The reference just silently becomes an orphan.
We ran integrity checks across the dataset and found roughly 1% of all cross-document references point to entities that no longer exist.
We started to check some business rule constraints and found that 14% of groups are missing a required field, something the application is supposed to enforce but that nothing at the data layer actually prevents. The previous developers also had about 30 async cron scripts that search within an index and check for error conditions. Of the 10 indexes, 6/10 have at least one failing data condition, most with 50% of the checks failing on at least one document.
I could go on, there's many examples where the application code *tries* to maintain consistency, but without database-level enforcement, it's a losing game.
### Significant Denormalization
*(forgive me if you already know this, but since these problems are so far from Elasticsearch I thought it'd be explicit)*
Since ES can't join across document types at query time, the original developers denormalized aggressively. The same data (e.g., a person's name, an organization's status) is copied into every document that needs it. As expected for a NoSQL datastore, an entity's data might live, partially duplicated, in 4-5 different document types.
When the source value changes, every copy needs to be updated independently. There's no transaction (in the database sense) that spans these updates... so some updates could succeed while others fail. I've personally seen instances where a member is shown as having participated in zero events, but when you click on his 'history' tab in the UI, I see 9 past events (and on a different screen, it shows 7 past events as (false) pre-computed aggregate)
### A custom ORM/DAO layer to bridge the gap
Django's ORM expects a relational database, so the team built a bespoke DAO layer on top of `elasticsearch-py` to make Django talk to ES. This layer handles query construction, pseudo-relationship traversal, serialization/deserialization, and a hand-rolled migration system... along with some business logic.
It works, in the sense that the application runs. But it reimplements (with less testing, less documentation, and fewer eyes) what Django's ORM provides out of the box for relational databases. Every new feature, every bug fix, every onboarding of a new developer pays a tax for this layer's existence. It's maintained by a single contracted developer currently.
### Silent partial results
This one concerns me the most from a data confidence perspective. As I understand it, ES can return HTTP 200 with partial or empty results if a shard or node is unavailable... which can happen routinely because of re-indexing and other background jobs. The `_shards` metadata in the response tells you, but the application doesn't check it. So in theory, users could be seeing incomplete data without any error being raised — and we'd never know.
## The strategy: migrate to PostgreSQL
It should be obvious that I want to fix this, and I'm looking to move the primary data store to PostgreSQL.
This migration is a prerequisite regardless of the longer-term strategic direction (improve current app, full rebuild, or move to an off-the-shelf ERP). I.e. from my perspective, the data needs to be in a relational store and cleaned up no matter what.
## My questions for the community
1. Has anyone else inherited or worked with ES as the sole primary database (+ for a relational domain?) I'd love to hear what you found and how you dealt with it. Am I right to think this is a fundamentally wrong use of ES, or are there patterns I'm not seeing that make it viable?
2. Is the snapshot → restore → extract approach the right one? My concern with using the Scroll API or `elasticdump` directly against the production cluster is the partial-results risk combined with concurrent writes. The snapshot gives me a consistent, point-in-time copy and an explicit failure if any shard is unavailable. Is there a better path I'm missing?
3. Anything you think I should say to the board of directors or the CEO about how insane this use case was?
4. Anything you think I should say to the previous developers that decided to do this? ES has become a swear word internally because of this...
Any experience, war stories, or corrections to my thinking would be appreciated. Happy to share more detail about the setup if it helps.