Merging search results across multiple related indexes

Working on a project where we have a core document (one index) and each document can have multiple related files that we want searchable (a couple other indexes for different types of files).

We want to implement a search that spans all indexes, but ideally we only receive results from the the core document index, but the core document index scores reflect matches on the associated file content as well. We have chosen to separate the files into additional indexes due to the variability of the number of related files, and for being able to add/update indexes more simply.

Any suggestions?

Hi @someguy - can you clarify, are you using Elastic App Search (in which case you're talking about documents in multiple engines) or are you using Elasticsearch where you're writing documents directly into indexes?

If you're using App Search, have you already looked at Meta Engines? Or have you considered (App Search or Elasticsearch) having one large document that represents both you core document and has fields to represent the supporting file contents as well, so that you can search over all the related contents in a single index/engine?

My bad - we're using Elasticsearch, not App Search. May need to move this to a different product/category?

We originally were going with the single large document. That absolutely makes the most sense - search wise. However, due to the significant numbers of documents and their supporting files, we decided to separate them to reduce the burden/load of having to re-index each time one bit of data changes.

I've read a little about parent/child relationships and nested documents. It didn't seem like those were options that would help significantly in this particular situation. Our approach was going to have to be to do a search across all indexes, and then do some post-search processing to merge file results to their parent document. We can do that, but its not optimal!

Happens all the time, no worries. I've re-categorized it.

Did you look at upserts at all? You can provide just the fields that are changing, and not have to re-index all the related content.

We have looked at upserts, and, maybe I'm misunderstanding, but it appeared in the documentation that you can't use the ingest pipeline for file indexing.

As described - we have a parent document with some data that will change fairly regularly. Each parent document will have 1+ files that we want indexed. The combination of the parent and each related child file document constitutes a single search item.

We have to be able to search all of these as a single unit, but we want to be able to index the files separately, and do not want to have to re-index the files when the parent document is modified.

Given those requirements, what would the optimal index strategy be? Would nested indexes work, or is there a better approach?

Thanks for your assistance!

Well, we've reached the end of my wisdom in Elasticsearch (I'm an Enterprise Search team member), so I'm going to loop in @dadoonet (who in turn may loop in someone else on the Elasticsearch team) to get you some nuanced guidance. Good luck!

Sean - thank you for your assistance. It looks like we are probably going to move to a single index, and look for ways to optimize the indexing of the files into the single document. That will give us the search experience we need. It just changes the focus of our efforts from search to ingest/index time.

Thanks @Sean_Story for the ping.

@someguy I think it could be a good use case for parent / child feature. See the join datatype.

In that case you could always index and update parent document without needing to reindex the child docs.

Would that work for you?

I agree that sometime it is better to reindex everything if it happens occasionally vs trying to use parent/child which comes with some complexity and hidden costs (like memory usage, single shard location...)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.