Can you specify default source for a meta engine?

Rob6 · June 23, 2022, 4:04pm

Hey,

I’m looking at how I can ‘swap out’ a source engine (let’s call it ‘A’) used in a meta engine and replace it with another one (B). I don’t want both A and B to be queried at the same time, as B will essentially be A with a number of document updates, and I don’t want the search query to have to specify which source should be used (as it won’t know). I also don’t want to remove A until B is in place (otherwise searches will at some point have no source at all whilst A is removed and B is added to the meta engine).

So - the only way I can see this working is if I can specify a default filter on a meta engine, which would be used to specify the source when a search is made. I can then add source B to the meta-engine, change the default filter to use source B and then remove source A.

With this in mind, can you tell me if a source can be set as a default search filter for a meta engine? I can’t see this mentioned in the docs.

If not, can I do this another way? I’m basically looking for a way to switch searches to use another engine that’s been updated in an atomic way (after all updates). It feels like meta engines get me nearly there if I can set a default source?

Thanks in advance!

Sean_Story · June 24, 2022, 3:45pm

Hey @Rob6

This is a good question. Can you help me understand your use case a little more - are you attempting to do A/B testing, where you want some users to query only A engine, and other users to only query B engine, but at the same time? Or are you just trying to facilitate a 100% switch from A to B for all users, once B is ready?

If the latter, I think that you can use Meta Engines, where your Meta Engine starts with ONLY A as a source engine, and then, when you're ready, you could use the Meta Engine API to switch the backing source engine to B. This allows your users to treat the Meta Engine name as an alias for whatever backing engine you like.

If the former, however (A/B Testing), there's not an easy way to apply a default filter to your engine from a framework level. You'd need to do that on the client side. You could achieve this by either filtering on the _meta.engine (see Meta Engines Guide | Elastic App Search Documentation [8.2] | Elastic), or by indexing your own field into all your documents which identifies which engine/source they come from.

Does this help?

Rob6 · June 24, 2022, 6:54pm

Hi Sean,

Thanks for the quick response. It certainly helps knowing i'm on the right track. The use-case i'm trying to solve is the 100% switch - going from A to B when B is ready.

From the docs, it appears there are APIs available so I can:

1. Create a meta-engine with the specified sources
2. Add a source to an existing meta engine
3. Remove a source from an existing meta engine

but it doesn't appear that I can update a meta engine to change the source from A to B in a single API call?

Looking at the API, I would have to add B in one POST call and remove A in a DELETE call. Is this correct? If so, this would leave a slight chance that both A and B would be present as sources before A is removed in the DELETE?

Likewise, if I do it the other way around (remove A in one DELETE call and then add B in another POST call), there's a chance that no results are returned whilst there is no source engine (or it may be that having no source on a meta engine is not allowed so this isn't viable anyhow).

Thanks again!

Sean_Story · June 24, 2022, 8:03pm

Hey @Rob6,

You're right, the sequencing could lead to momentary undesired behavior. However, chaining the two requests (one to remove A, the other to add B), back-to-back should result in a sub-second empty meta-engine. Hopefully that's acceptable, as a user will likely just refresh a page or re-issue a query if they momentarily get 0 results from their search.

Rob6 · June 29, 2022, 6:49pm

Hi Sean,

Thanks again for the info. Sadly I think incorrectly returning 0 results from a search due to documents being updated isn't any good for my use case. I'm sure this would be raised as an intermittent bug at some stage, and I don't think i'd get away with saying it's expected behaviour.

Just out of interest, is there a reason for the different API end points for adding/removing sources from a meta engine? I was kind of expecting a PUT/PATCH on the engine API itself to update the source and just interested why the current approach was taken.

Thanks again

Sean_Story · June 29, 2022, 7:33pm

Hi @Rob6

The API was designed to make it very simple to add OR remove engines, but I do not think that we anticipated a need to immediately swap engines. If we had implemented a PUT API, it would require you to know the exact state of your meta engine as you made your call, or risk overwriting the state change that another client recently made. For example if you and I both have access to the same meta engine, with engines A, B, and C, and you want to remove C at the same time that I want to add D, what will the end state be? We probably want an end result of A, B, D. But if you issue a:

PUT /engines
{
  "type": "meta",
  "source_engines": [
    'A', 'B',
  ]
}'

and I issue a

PUT /engines
{
  "type": "meta",
  "source_engines": [
    'A', 'B', 'C', 'D'
  ]
}'

We'd end up with either ['A', 'B'] or ['A', 'B', 'C', 'D'] depending on whose request is served last.

This doesn't fix your issue, but hopefully it explains the API design as we have it. If you have a support relationship with us, I'll recommend that you file an Enhancement Request to add a PUT/PATCH API for this endpoint, and we can work to prioritize that feature.

Rob6 · July 10, 2022, 10:42am

Hi Sean,

Thanks for the reply. I think I understand the reasoning, but I also think having separate add/remove calls suffer from a similar issue depending on what each client expects the end result of their own invocation to be. For example, if the current engine has sources [A,B], and one client adds source [C] based on those sources, and the other adds [D] based on those sources, they will both have incorrectly set the sources to be [A,B,C,D], and neither get what they actually expect.

I think this is just a race condition that most APIs have to handle if they are to expect multiple users to call from the same customer account. Systems handle this in a number of different ways, for example, servers could:

return a hash token with the GET which represents the object state at the time of the request
clients then pass this token when doing a PUT
server returns an error if the hash does not correlate with the current state of the object being updated

In the above solution, the client then knows to re-GET to see what has changed and then either update their PUT request etc based on the result.

Also, sometimes the API just expects the last request to win. Even if the PUT request returned the updated state once changed, the client could check to ensure the resulting state is what they expected and act accordingly. And even if the PUT didn't return the state, the client could re-issue a GET to check.

I do have support with you, so have raised a request. In the meantime I think I'll have to put a proxy in between the client and the app search which will add a filter specifying the source that should be used. Can't think of anything else that would work for now.

Thanks again,
Rob

system · August 7, 2022, 10:42am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.