I’m designing a search system where multiple SQL tables are synced into a single Elasticsearch index. The issue I’m running into is that the tables are not perfectly aligned in terms of contextual meaning.
For example:
In a change request table, the supporter employee code is not the person responsible for handling the request.
In a support ticket, the supporter field is the person responsible for answering the ticket.
So while some field names overlap, their actual meaning differs depending on the source table.
My current question is:
Should I keep these fields source specific, e.g. ticket.supporter and change_request.supporter?
Or should I try to normalize them into generic fields?
How should this be handled if I want to support faceted search/filtering across many different entity types?
I’m also concerned about mapping explosion, since this application will eventually index many different tables with partially overlapping schemas.
Has anyone dealt with a similar Elasticsearch modeling problem, and what approach worked best?
Why a single Elasticsearch index? Why have you introdiced this limitation?
When I have seen this discussed in the past it has sometimes been due to users wanting to use parent-child relationships to try to manage relationships, which is generally a very bad idea.
In my experience it is generally best to not replicate table by table but rather transform the data and store documents that aligns with how you want to search the data without reling on parent-child relationships. This is naturally more complex and time consuming but tend to give much better results.
I'm creating an enterprise search application where I intend to generalize various tables into a more generic search item. With fields such as title, description, created/mutated dates etc.
But I'm now running into an issue, while attempting to create a facetting system, where some filter options are context-dependent between schemas. I was led to believe using a single index would be the best route, as this would have better performance and reduce querying complexity.
I'm not using this to facilitate any sort of relations.
I am transforming the data, but in company context a status for one of the tables is not the same as the status of another. Which is why I think seperating these is better. Regardless of index count the UI would show these filters seperately.
Additionally the elastic index is only used for searching the relational database still serves as the source of truth for any real operations.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.