2 way search by query and incoming docs

I am working on an app where the goal is to find likeminded people in your area, nothing revolutionary.

Given a user's age, minAge,maxAge fields, we want to query for other docs that meet the criteria. However, the search criteria need to be met on both sides. For example, the user's age should be in the minAge and maxAge range of the "incoming" docs.

What is the most efficient way of doing this? I currently have a solution using function_score but not sure if it's the most efficient:

POST people/_search
{
  "size" : 10,
  "min_score": 0.6,
  "query" : {
   "function_score": {
    "query": {
       "bool" : {
          "must" : [
            {
                "range": {
                "age": {
                  "gte": 18,
                  "lte": 30
                }
              }
            }
          ]
       }
      }
    },
    "functions": [
      {
        "script_score": {
          "script": {
            "source": "if (params.age >= doc['minAge'].value && params.age <= doc['maxAge'].value) { return 1.0; } else { return 0.0; }",
            "params": {
              "age": 28
            }
          }
        }
      }
    ]
  }
}
}

I am using age here as an example, but ideally we want to have the same logic on distance, etc.

Would appreciate any help!

@Kathleen_DeRusso @dadoonet sorry for pinging directly, but i was wondering if you'd be able to offer any guidance here? thank you so much!

From Elastic Search to Elasticsearch

Please be patient in waiting for responses to your question and refrain from pinging multiple times asking for a response or opening multiple topics for the same question. This is a community forum, it may take time for someone to reply to your question. For more information please refer to the Community Code of Conduct specifically the section "Be patient". Also, please refrain from pinging folks directly, this is a forum and anyone that participates might be able to assist you.

If you are in need of a service with an SLA that covers response times for questions then you may want to consider talking to us about a subscription.

It's fine to answer on your own thread after 2 or 3 days (not including weekends) if you don't have an answer.

1 Like

Why not just a bool query with 2 must clauses?

Agreed, a Boolean query seems like it should be sufficient for your filtering purposes here?

hi David and Kathleen, apologies again for breaking the guidelines. will keep in mind for the future.

regarding the Boolean queries, are you suggesting something like this:

"must": {
    "script": {
      "script": {
          "source": """
            double distance = doc['location'].arcDistance(params.lat, params.lon) / 1000;
          
            boolean isWithinDistance = distance <= doc['searchRadius'].value;
            boolean isWithinAge = params.age >= doc['minAge'].value && params.age <= doc['maxAge'].value;
            
            return isWithinDistance && isWithinAge;
          """,
          "params": {
            "lat":51.545874,
            "lon": 0.048306,
            "age": 28
          }
      }
    }
}

the above works but my concern is around efficiency, would you say this is the most efficient way of filtering based on the incoming parameters? is there any other feature that I am not aware of / should look into? thank you so much for your help!

Maybe I'm missing something, but I thought @dadoonet 's original selection was to replace the script with a regular Boolean query with filters and range queries.

This is pseudocode, but imagine something like:

POST _search
{
  "query": {
    "bool" : {
      "filter" : [
        { age: >=18, <= 30 },
        { wants_to_match_with_age: <= 28, >= 28 }
      ]
    }
  }
}

Obviously the real query would have to be adjusted based on your data, but maybe you can structure it in a way so you can avoid that script score.

1 Like

Ah got it, I think that works for the age, but are we able to do the same for the distance and searchRadius without the script? :thinking:

I'll confess that I don't have a lot of expert knowledge in our geospacial search, perhaps you could index a geocircle with the allowed radius? You'd have to play with the search to see if it worked for your needs.

1 Like

Have a look at Geo-distance query | Elasticsearch Guide [8.14] | Elastic

I did look at the geo-distance query, and it's what I used to find the docs that are within the search radius of the user that's running the query. But I couldn't find a way to do the inverse of it where we make sure that the distance from the doc is within the search radius of the other user...does that make sense? thank you!

I have no idea of what "the other user" means in this context.

Anyway, if you want to move forward, please provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Have a look at the Elastic Stack and Solutions Help · Forums and Slack | Elastic page. It contains also lot of useful information on how to ask for help.

I think the script here that I shared 2 way search by query and incoming docs - #7 by elsaticnoob demonstrates what I want to do in simple terms.

Basically, I couldn't find a way to do a geo distance query where the distance value comes from the incoming docs, so I can't do something like:

"geo_distance": {
          "distance": doc["searchRadius"], // is this possible?
          "pin.location": {
            "lat": 40,
            "lon": -70
          }
}

does that make sense?

I see. Indeed, this does not seem possible with the geo distance query.
Indeed the only way seems to be by using a script for this query.

Yeah but I don't see if you have any other choice...

But I'd try to make that script simpler and only compute the geo part.

I'd add this script as one of the bool -> filter clauses and I'd add the age filter as a range query.

That way, you won't have to evaluate the geo part when age is not matching and I think that could be much faster.

Agreed, I can do that. I also had an alternative idea like this:

{
            "terms_set": {
              "locationGrids.enum": {
                "terms": ["86154e68fffffff"],
                "minimum_should_match": 1
               }
            }
},

where we have a pre-computed locationGrids field, that has the geo-hex values of a user's location + search radius. so it sets locationGrids to all the geo-hex values of the hexagons that are within the search radius for the user.

and so when we search, we just do a terms_set match on that field, and this also seems to work pretty well.

thank you for your help!

Thank you for sharing that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.