Migrating off ElasticSearch as sole primary database for a relational business domain — anyone done this?

# Migrating off Elasticsearch as sole primary database for a relational business domain — anyone done this?

I've inherited a ~6 year old production Django application where Elasticsearch is the only data store. Not "ES for search, Postgres for data" — ES *is* the database. ~20 document types, ~40GB of application data, serving a business domain that is deeply, deeply relational (think: organizations have members, members have groups of members, groups of members have an elected leader, each group of members has a point of contact with the company, each group meets monthly, when they meet monthly they meet with a teacher, invoices reference missions, etc.).

The application has been in production for several years with real users and real data. I'm an interim CTO now responsible for figuring out what to do with the platform, despite having never used Elasticsearch.

I want to share what I've found, explain the strategy I'm pursuing, and ask whether anyone has been in a similar situation.

## What we're dealing with

First, let me say that 95% of the queries in the code are basic lookups by document id or by single-field search. There's one 'full text' search bar in the app, it's not a massively used feature, and I'm 99% sure I can accomplish all search cases with some basic Postgres full-text search techniques.

There's a number of indices which are being (ab)used as equivalents to collections/tables, i.e. one data/document type per index.

### Referential integrity

ES has no foreign keys, no unique constraints, no cross-document transactions. The application stores inter-entity relationships as string IDs embedded in documents, with zero enforcement at the database level. If entity A references entity B by ID, and entity B is deleted, nothing stops that. The reference just silently becomes an orphan.

We ran integrity checks across the dataset and found roughly 1% of all cross-document references point to entities that no longer exist.

We started to check some business rule constraints and found that 14% of groups are missing a required field, something the application is supposed to enforce but that nothing at the data layer actually prevents. The previous developers also had about 30 async cron scripts that search within an index and check for error conditions. Of the 10 indexes, 6/10 have at least one failing data condition, most with 50% of the checks failing on at least one document.

I could go on, there's many examples where the application code *tries* to maintain consistency, but without database-level enforcement, it's a losing game.

### Significant Denormalization

*(forgive me if you already know this, but since these problems are so far from Elasticsearch I thought it'd be explicit)*

Since ES can't join across document types at query time, the original developers denormalized aggressively. The same data (e.g., a person's name, an organization's status) is copied into every document that needs it. As expected for a NoSQL datastore, an entity's data might live, partially duplicated, in 4-5 different document types.

When the source value changes, every copy needs to be updated independently. There's no transaction (in the database sense) that spans these updates... so some updates could succeed while others fail. I've personally seen instances where a member is shown as having participated in zero events, but when you click on his 'history' tab in the UI, I see 9 past events (and on a different screen, it shows 7 past events as (false) pre-computed aggregate)

### A custom ORM/DAO layer to bridge the gap

Django's ORM expects a relational database, so the team built a bespoke DAO layer on top of `elasticsearch-py` to make Django talk to ES. This layer handles query construction, pseudo-relationship traversal, serialization/deserialization, and a hand-rolled migration system... along with some business logic.

It works, in the sense that the application runs. But it reimplements (with less testing, less documentation, and fewer eyes) what Django's ORM provides out of the box for relational databases. Every new feature, every bug fix, every onboarding of a new developer pays a tax for this layer's existence. It's maintained by a single contracted developer currently.

### Silent partial results

This one concerns me the most from a data confidence perspective. As I understand it, ES can return HTTP 200 with partial or empty results if a shard or node is unavailable... which can happen routinely because of re-indexing and other background jobs. The `_shards` metadata in the response tells you, but the application doesn't check it. So in theory, users could be seeing incomplete data without any error being raised — and we'd never know.

## The strategy: migrate to PostgreSQL

It should be obvious that I want to fix this, and I'm looking to move the primary data store to PostgreSQL.

This migration is a prerequisite regardless of the longer-term strategic direction (improve current app, full rebuild, or move to an off-the-shelf ERP). I.e. from my perspective, the data needs to be in a relational store and cleaned up no matter what.

## My questions for the community

1. Has anyone else inherited or worked with ES as the sole primary database (+ for a relational domain?) I'd love to hear what you found and how you dealt with it. Am I right to think this is a fundamentally wrong use of ES, or are there patterns I'm not seeing that make it viable?

2. Is the snapshot → restore → extract approach the right one? My concern with using the Scroll API or `elasticdump` directly against the production cluster is the partial-results risk combined with concurrent writes. The snapshot gives me a consistent, point-in-time copy and an explicit failure if any shard is unavailable. Is there a better path I'm missing?

3. Anything you think I should say to the board of directors or the CEO about how insane this use case was?

4. Anything you think I should say to the previous developers that decided to do this? ES has become a swear word internally because of this...

Any experience, war stories, or corrections to my thinking would be appreciated. Happy to share more detail about the setup if it helps.

Here's my two cents. Hopefully others will chime in. Disclosure I work at Elastic on Elasticsearch itself. In past lives from an experience standpoint I have both been in similar situations and in the reverse situation migrating an ailing Postgres deployment to ES. I've used ES since ES 2 and various relational data stores including Postgres in my career. ES and Postgres are some of my favorite databases.

The biggest question I find myself asking after reading your post is: Why ES originally? If you can you should ask the devs who were around historically what the original hope or intent was with using ES. Specifically you mentioned full-text search. I have tried to use Postgres myself to do full-text search at scale and it will break down on you. Particularly if you expect to grow your user base or search capability in this AI age you should think hard before migrating completely. Full-text search is what ES does well. It also has state-of-the-art vector search capability that when you combine the two as a form of hybrid search you really hit on the value proposition difference between something like ES vs Postgres.

As the CTO if you don't foresee growing in this area or having this requirement around full-text search then the next question I would have is around scale. The whole reason for denormalizing the data is to deal with scale. If you re-normalize the data you should expect to deal with scaling issues or have a small enough / bounded datasets such that you don't need to worry about scaling in Postgres. This is literally what I've run into in the past. I built a priority-queue-esq app (as so many apps essentially are) and started it out on Postgres; it worked so long as users only needed access to about 3 months worth of data, and then when 3 months rolled around PM said we had to keep the data. 4 months went and past. At about 7 months the database became unstable. No one wanted to purge data. It was either shard Postgres or migrate to something more naturally distributed like ES. And text search was the primary bottleneck as we tried to scale up on Postgres. We switched entirely to ES; things were smooth.

If you don't have scaling concerns and don't have meaningful full-text search Postgres is awesome for all the reasons you mentioned. Getting full ACID compliance can greatly simplify data handling particularly as you mentioned for things like cascade on delete and Postgres is often easier for more junior devs to get comfortable with; it's just been around longer. Elasticsearch can do things like delete by query but it's not cascade on delete. The referential enforcement costs you something and ES squarely puts that on the application logic.

ES for search, Postgres for data

I've seen successful ES for search and Postgres for data approaches but you are maintaining referential integrity still at the application logic layer. I think folks often overlook that it's a lot of extra complexity to write to both and know that you have successfully written to both. Simplify if you can. It's perfectly reasonable to do just ES and I've built applications both ways. Using both is best when you don't have a lot of references between the two. To get ES to play more nicely things like ESQL make small indices that hold referential data much easier to interact with now as well. It's also perfectly reasonable to do just Postgres. Usually when someone made the decision to do ES for everything they were trying to keep it simple on the application logic side.

To hear you describe how the data is denormalized whether you go with ES or with Postgres I would caution that any poorly designed schema will run into problems. It sounds like you are searching across multiple indices in different use-cases and making updates across all of those indices. My immediate gut reaction without knowing anything else is why is this not one search index. I mention this because it may be a less aggressive step to simplify and streamline your application logic and ES setup before completely switching to Postgres; it essentially gives you time to learn whether you want to keep ES at all for your search use cases. And it's important to understand so you don't make the same mistake if you do go with Postgres here. For instance when I hear you have a custom connector between Django's ORM and ES I would wonder why you aren't using something like django-elasticsearch-dsl (caveat I've never used it myself; it's a legitimate question). This might feel counter productive but if you answer why ES feels painful now you might find even in migrating to Postgres you've solved organizational and architectural problems that would have come with the migration.

Silent partial results

You can disable this in ES: allow_partial_search_results

the data needs to be in a relational store and cleaned up no matter what

Pros / cons mentioned above aside, I think it's worth mentioning making any decision should be done in the context of something that energizes your team. I've seen too many CTOs make decisions that are "right for the business" but tank morale and make it hard to build something folks love to use. My advise is strive to build a technology roadmap your developers will love and you'll find they make sure the application is something customers will love too. Sometimes that's enough of a reason to switch to or stick with something.

Is the snapshot → restore → extract approach the right one?

I would think architecturally first. Setup your application logic to write to Postgres and ES both for a time so you can fall back if needs be. Once that is running use elasticdump or an ES search_after PIT query (it's what we recommend instead of the Scroll API) to only migrate over chunks of the historical data. I've used both; they both work great. If you are doing regular updates make the dump query relative to an update field and overwrite any existing data in Postgres in the interim. I wouldn't be concerned about partial results.

Anything you think I should say to the board of directors or the CEO about how insane this use case was?

This is a whole other can of worms. I find it's less valuable to focus on the negative and instead focus on the positive with folks at this level. Keep your strategy brief and clear. It's why I mentioned my original question of: Why Elasticsearch originally? If you can answer that question succinctly then you hopefully can easily form up the answer for the board as to why it is no longer the right option or why time needs to be spent on technical debt in your current ES setup.

Anything you think I should say to the previous developers that decided to do this?

Dialog is good. Don't dictate to them; ask them what they think is good and bad about ES and what it helps solve. Ask them if they would have done it differently. Help them convince themselves or you of the right thing to do. If you are right about Postgres a well reasoned data-driven or logical outcome should fall out of that discussion. Find a common ground or enemy that's not putting blame on the current architecture or engineers such as deferring blame to demanding customers that made the company rush into technical debt like skipping reasonable audit checks that everyone knew were needed to make an architecture with ES succeed in your case (I'm making this scenario up obviously).

I'm an interim CTO now responsible for figuring out what to do with the platform, despite having never used Elasticsearch.

One way or another happy to discuss more with you; I really just like solving problems and helping folks out in general. I'd like to think it's a shared trait amongst Elastic engineers. And specifically if you wind up having questions about ES feel free to ask. It's a weighty pair of shoes you've stepped into; so one way or another good luck.

Hi John,

I couldn't have hoped for a more engaged and thoughtful reply, hearing your perspective definitely moved the needle for me.

Why ES originally?

According to the owner of the software development firm who provided the team, it was because the original requirements for this application was to provide full-text search for ~30 years of OCR'd PDFs as part of the back office.

I think the business still wants this, but more in the "I'd love to be able to get insights from our past data" hand-waving wishful thinking, rather than a daily/operational need to quickly find historical PDFs.

To me, even if it had been built, this should have been a separate feature/service, as I don't see how the application data would need to live in the same DB (while suffering the aforementioned tax on maintaining relational data in ES).

Worth noting, the software company that originally built this uses ES for their 'main' SaaS business which is indexing and providing search features against large-scale video datasets that have gone through captioning and OCR for object recognition... so I think that played as a factor when they reached for a familiar tool.

Scale

Regarding scale, I'm under the impression (and I feel most of the industry is too...) that Postgres can easily handle data sets 15x our size in a single node/instance, and beyond that as well, as far as DB goes. Our business is stable, and hopes to expand by a whopping 50% between now and 2030.

Worth mentioning: we only have ~100-~150 active users of the platform in a day, because it's an ERP (not a shiny dopamine app :wink: nor a public search product), so the I/O demands are really reasonable.

ES for search, PG for data.

As far as search, I've installed instrumentation to properly log the application's use, I'm looking forward to see how (+ how many) users are actually using the search feature, but again, I think feature/requirement is so much lower priority than having 'good data' that it will be a secondary consideration (but my gut is search is not really used beyond basic text matching on columns)

I have experience with streaming PG data to a secondary database to keep it in sync incrementally, along with management mechanisms (wipe and repopulate) and it worked really well for my past use cases, I'd consider that again if search was a priority.

Indexing/schema

This might feel counter productive but if you answer why ES feels painful now you might find even in migrating to Postgres you've solved organizational and architectural problems that would have come with the migration.

I'm not sure I understand this sentence, but I think I understand the meaning. Yes, I'm sure I could make a systemic effort to use ES correctly, improve the schema, bolster the application-side logic, but the fact is that this is not a tech company, this is a 35 year old non-profit (that happens to have a very successful business model serving their own members).

What I will say is that I have zero prior experience with Elasticsearch... Before this conversation, I've been hearing things from peers like "Is the data on Elasticsearch even meant to survive restarts?" and "WHAT DO YOU MEAN THEY USED ES AS THE PRIMARY APPLICATION STORE" so your calm and grounded explanations are deeply appreciated.

PS: thanks for the tip re: allow_partial_search_results:slight_smile:

Energizing the team

It's a bit of a particular situation. Without too many details, they attempted (and failed) to migrate their entire business, including the ERP/planning side, to a hosted SaaS CRM. That failed predictably, which is where I was brought in to figure out what to do next.

What this means, however, is that the previous 4-6 person development team has been stripped down to 1 outsourced dev and the Product Owner. It's up to me to define the 'new' team needed to head in the right direction, and I think everyone is going to be excited to treat foundational problems instead of adding an Nth "can you send an automatic email when this condition is met" feature request.

Double-write migration

Thanks for confirming, yes, the double-write sounds good to me, and thank you for the link to the search_after PIT API, that was on my radar as a best practice vs. the Scroll API.

Communications to the board + devs

This question was perhaps more facetious than I intended it to be. I think I am looking for a clear, foundational perspective as to why ES is the wrong choice for their business type (relational memberships, ERP operations) in terms of a metaphor, or other 'clear' messaging (IMHO it's inappropriate, for example, to have invoicing/subscription logic on a non-transactional DB... which is the case).

I agree with keeping things brief, and clear. I'll mention that ES was intended for a feature that is neither in production, nor necessary to be baked-in with the application logic.

Note:
Last night I performed a meta-analysis of all tickets/issues in our repo since 2019. 24% of all work tickets analyzed could be tied to data issues that I think would not have been the case with a traditional relational DB.

John, you do a credit to the culture of Elastic's team. Thank you again for taking the time to engage with me.

I often ask as a candidate or if involved in a new project/problem: “What’s your worst technical problem right now?” The more important follow-up is: “Why isn’t it solved already?”

You’ve clearly formed a strong view on what the core issue is. Nobody here is really in a position to definitively confirm or refute that. What’s less clear - and more important - is the answer to the follow-up. Do your colleagues, with more experience in that specific environment, broadly agree with your diagnosis?

The recent failed migration to a hosted SaaS CRM would to me imply otherwise.

I am intrigued by this question:

"insane" is your word. It's a very strong word. Is that view shared widely? If yes, you are pushing against an open door, stop worrying about how insane it was. if no, then there's clearly a "case for the defense" that we're not seeing.

Well, yes, though noting "relational" is not a binary concept. But personally I try not to use the word database in Elasticsearch context, as it sort of implies all sorts of "database" terminology/concepts may also apply, and much of it does not. But yes, in lots of use cases, elasticsearch IS used as the primary, and often sole, data store.

I also dont wish to be unkind, so brace yourself, but I would be careful to ask someone to assess an elasticsearch-based solution, that has ran for many years, who writes:

who gave some credibility to:

and maybe hasn't fully grasped the difference between a HTTP Response code and the content in that response. I'd also quibble with "routinely" in that context, and

Well, yes, if you leave bugs like this unfixed you can expect issues. Fix it and maybe the 24% of issues you attributed to "no relational database" would also be reduced.

All that said, almost everything you write supports your diagnosis. It seems "wrong". Relational databases are really cool, with a lot of important features if implemented correctly/wisely. As written, your use case would seem to be more appropriately implemented differently.

But it's a bit like the jury just listening to the prosecution case, vividly presented by a determined prosecutor who has heard the accused is a "wrong un".

Hello! @dadoonet shared this internally, and I thought I would share my experience here. I'm also biased, as I currently work on benchmarking at Elastic, including Rally, and have worked on elasticsearch-py and the Elasticsearch specification for the past two years.

Before Elastic, I worked for 5 years at a startup in the HR industry. The CTO decided we would have only one data store and one programming language, given the issues he had faced with an explosion of tools at a previous company. He chose Python, Django, and Elasticsearch 1.x in 2014. Contrary to popular belief at the time, we never lost a single document, and Elasticsearch as a technology never really blocked us.

There was one big issue, though. Elasticsearch 6 removed mapping types, but we had tens of indices, hundreds of doc types, and thousands of queries, so we knew upgrading would be a major project. Around the same time, our company was acqui-hired and the product was put in maintenance mode. Our limited engineering resources after the acquisition prevented us from upgrading to Elasticsearch 6 or later. It was just too much work. We also experimented with PostgreSQL at the time, but, again, with the lower resources, the migration was too costly and never happened. (Our product was discontinued a few years later.) Thankfully, Elastic, as a company, has matured significantly since then, and upgrades are easier than ever. And, well, the removal of types really needed to happen.

What worked well:

  • Rolling upgrades without downtime
  • Scaling up or down was easy
  • Elasticsearch Python DSL helped our frontend developers to create queries quickly. (We did not use Django's ORM)
  • We had a lot of denormalization too, but also used the join type for our main data type. It helped with consistency, but not sure I'd recommend it given the performance implications.
  • Performance was good. PostgreSQL with indices and transactions wasn't faster.
  • Whenever we needed to store data, we knew where to put it: Elasticsearch!
  • Aggregations and full-text search features worked great.

What didn't work as well:

  • Upgrading past Elasticsearch 5, as mentioned above.
  • Get all new engineers to learn the Elasticsearch Query DSL. Using the Python DSL helped, but also led to wrong queries from time to time! Backend engineers preferred the verbosity of the full Query DSL for this reason.
  • For larger batch jobs, our initial choice of two shards for an important index became a bottleneck, which we did not take the time to address. (Elasticsearch Serverless now implements autosharding for this very reason.)

If I had to do this again today, I probably would. But I'd use Elasticsearch Serverless and rely on ES|QL since it's easy to use and supports LOOKUP JOIN, which is a real join, even though the lookup index can only have one primary shard. Elasticsearch is also great for machine learning and vector search. But an important parameter here is the version you're on. If you're using an older version, it will be more difficult to use all the new features.

Hi @RainTown,

Thanks for the constructive friction :slight_smile:

Our biggest technical problems are not related to ES, but the biggest technical problems related to ES are:

  • People don't trust the data (they've been burned enough times that they open support tickets to perform exports of data because the UI isn't showing them right/complete data)
  • Enormous amounts of resources spent / Yak shaving to 'stay on top' of data issues.
  • Managing dev/staging/preprod environments is hard from a data perspective, i.e. copying data or streaming changes is not built-in at the DB level. They haven't invested in proper seeding scripts, nor ETL + anonymization, so it's kind of a mess. There's a variety of tools/extensions I've used for this in the past with PG in particular.

The reasons those haven't been solved already are:

  • No technical employees on staff that is qualified to make executive-level technical decisions, so the architecture is made by external contractors that are each trying to improve their part of the puzzle, but nothing unified
  • They've been talking about moving away from this platform for years (a lot of frustration is due to it being poorly hosted in a colo), so they haven't invested in core problems.

Every technical colleague I've discussed with is flabbergasted that ES is used as the primary datastore for this kind of project.

The migration to the hosted SaaS CRM failed because the systems integrator promised ERP features they couldn't deliver on... I'm not surprised given the architecture of the destination platform, but it's unrelated to the technical aspects of the current system.

As I acknowledged in my response to @john-wagster, the use of the word 'insane' is inappropriate/unhelpful, sorry for polluting this discussion with my (strong) frustrations. I am anticipating a board that wants 'quick results' when there's a foundational problem (as I perceive it) requiring a significant investment (migrating to PG), which is why I'm looking for validation (or differing opinions) on this forum.

Also, the quote about "Elasticsearch data doesn't persist restarts" was absolutely not something I'm giving credence to, I promise :wink: but it is indicative that Elasticsearch's reputation is not "bulletproof data store", but instead "great search tool".

I have no doubts about Elasticsearch's underlying technology, teams, nor product. I've always heard good things about it's ability to deliver on things like logging + search, search ranking problems, etc.

I assure you I understand the difference between the HTTP response code and the content... but I also assure you I'm working with an application built by people who do not. I checked in the codebase, the _shards metadata is not currently used to verify query completeness. My position is that when you're developing an application that requires heavy business domain knowledge, from a management standpoint, I don't want to make choices that add a cognitive burden to the developers, i.e. they shouldn't need to understand the subtleties of the underlying infra when performing basic document fetching. We're not needing to optimize for blistering performance, so I want developers to work in an environment where they can assume a successful response is what it seems to be.

If I may push back gently on something you said, relational databases are more than just 'really cool'... they're a foundational technology for almost every modern service that requires authoritative records. Travel, Finance, Logistics, Healthcare, Telco, Governments, Insurance, and more are all built on relational databases. I'm at the helm of this platform that wants to provide invoicing, event registration + attendance verification, event management, some algorithmic stuff (combinatorial problems), some basic search, and tons and tons of CRUD/CMS backend work.

Even the fact that updating the same document causes a reindex... and all we do is manage member contact info, event attendance/registration, descriptions of events, etc. This seems like such a massive architectural mis-match between our use-case and how ES works fundamentally!

I'm open to staying on ES, especially if someone with an abundance of experience in a related project (relational business + ERP workflows) can show us what practices are creating the current problems, and what non-hack/non-custom solutions exist to solve them.

Currently, it seems we're running up against ES's weaknesses, without benefitting from its strengths.

Thanks @Quentin_Pradet for sharing your experiences. I do know they successfully migrated from v5 -> v8 about 18 months ago.

I can confirm they're not using the Elasticsearch Python DSL. Production is running on 3 nodes.

@all: here's an example of a function in our codebase, get_document_by_id , the equivalent of SELECT FROM my_table where id = ${id}

I'm seeing several head-scratching things in this code, not the least of which is manually refreshing the index before each query?

This function is called in 46 places in the codebase.

The best practice is to refresh after indexing, not searching. (If fresh results are needed immediately and can't wait a few seconds/minutes.) That said, in this case, es.get(index=index, id=key) is enough. Getting a document by its ID is real-time and does not need a refresh.

edit: Reading the docstring, why do you have an alias that points to multiple indices?

And, yes, Elasticsearch is not a good match here, but you're not starting from scratch, so you have to judge the difficulty of improving the current situation vs. migrating to PostgreSQL.

Thanks for taking in spirit it was intended.

Oh boy, you picked a winning ticket. :slight_smile:

I answer just a few specific points:

Well, one might say there was a clue in the name !

I am not even sure what that means, but developers are expected to understand the tools/tech they use. If they dont, it's pretty irrelevant which specific tools/tech they are using, something not very good will likely emerge. I certainly believe in KISS, but the "don't add a cognitive burden" language reads a tiny bit impolite. (NB: I am not a developer)

That's what makes them really cool. e.,g. ACID guarantees are often a really big deal.

But there's also a lot of noSQL data stores around, it's been a growth market, so clearly Oracle/Postgres/MySQL/MSQL/... have challenges too, and are not a panacea. Others have touched on aspects of that.

@Quentin_Pradet answered on index refresh calls, there appears to be maybe a misunderstanding there, the need (or otherwise) for a index refresh has come up a few times while I've been active on this forum.

Good luck with the migration !

I think the business still wants this, but more in the "I'd love to be able to get insights from our past data" hand-waving wishful thinking, rather than a daily/operational need to quickly find historical PDFs.

To me, even if it had been built, this should have been a separate feature/service, as I don't see how the application data would need to live in the same DB (while suffering the aforementioned tax on maintaining relational data in ES).

This rings true and makes a lot of sense. I can't even remember the number of times I've been previously asked if we could process PDFs throughs some OCR pipeline and get them searchable only to have it abandoned when product folks realized how much it would actually cost to solve the problem.

It also sounds like ya'll are a small shop at least for now. I completely agree then that it makes sense to consider splitting off the relational data from the search use-cases.

If / when you get there our team has been doing some interesting stuff with multi-modal models from a recent company Elastic acquired call Jina. Think easily embed and search all your PDFs without thinking too hard about OCR. We also have one of the Tika folks onboard now; super cool guy. Something tells me PDF processing is one of those things we'll be really good at in the near future.

Regarding scale, I'm under the impression (and I feel most of the industry is too...) that Postgres can easily handle data sets 15x our size in a single node/instance, and beyond that as well, as far as DB goes. Our business is stable, and hopes to expand by a whopping 50% between now and 2030.

Worth mentioning: we only have ~100-~150 active users of the platform in a day

I agree Postgres can scale really well on well formed structured data. It's the text search that gets you in my experience. I'd say give it a test run at a reasonable scale; I'll bet you'll be ok but probably a good idea to make sure. It's been a bit since I last tested but in the low millions of rows with small 2-3 sentence passages text search would will really tax Postgres even with only a few users querying at the same time.

I'm looking forward to see how (+ how many) users are actually using the search feature

I like this mentality; use data to drive your decisions. I've used Matomo a good bit on projects in the past. The key being self-hosted, free, and quick to setup which is always nice when you have a small shop.

I agree with keeping things brief, and clear. I'll mention that ES was intended for a feature that is neither in production, nor necessary to be baked-in with the application logic.

Yep I agree this is the right approach.

Great discussion. One thing I'd add — if you're restructuring your ES indices
as part of this migration (new mappings, splitting indices, etc.), managing
those schema changes across environments can get messy fast.

For the ES side of the migration, version-controlling your mapping changes
helps. The standard approach is:

  1. Define each schema change as a versioned file
  2. Apply them in order (dev → staging → prod)
  3. Use the alias swap pattern for zero-downtime cutover

You can do this manually with curl scripts, or there's an open source tool
called ScaledSearch that automates it (versioned YAML migrations, dry-run,
rollback, alias swaps):

Full disclosure: I'm the author. Happy to answer questions about the
migration patterns regardless of whether you use the tool.

Wow,

Very interesting thread. I have seen huge RDBMS database converted to mongodb(same as elastic) with millions of transaction per day with TB of data. took them year to do it with about 7-8 people team.

I started using elastic from v6 now on v8. now has billions of data points with 100s of indexes in multiple clusters.

in your case you have to get everything sorted like how many codes are writing to elastic and reading from it and convert that toward postgresql.

I have seen other way around case as well. One of my client was using postgresql ( mainly using world map) geo location. but it was getting slower and slower due to size. Ported everything to Elastic. Just like someone said in thread started double write (PostgreSQL + Elastic) once all ported and everything worked. they decommission PostgreSQL because Elastic was giving them way way better performance.

I can't let mongoDB is "same as elastic" pass by without comment, sorry.

To me, mongodb is a LOT closer to a traditional RDBMS than elastic has ever been, hence such a migration is more realistic/doable, albeit likely still difficult.

The way I think of it, mongoDB is trying to be the best scalable, near general-purpose database it can be, trading some traditional RDBMS strictness for horizontal scalability and operational flexibility. It was effectively the "source of truth" at my most recent big project, and had plenty of nuances/challenges.

Elasticsearch has never, AFAIK, tried to present itself as a general purpose database. That said, it has grown many arms and legs, sorry capabilities, from its' "You Know, for Search" roots.

The underlying technical details are also very different, which any search engine/AI tool could list.

es.get or es.mget api will give you 10 time faster search using id. we ran quite a few test. converted lot of search query which was using regular search with either get or mget. also we had some sql in play. converted that to es|ql or rest with search_after and PIT. We had issue on cluster where scroll was hitting it's limit.