Query results non understood

Hi everybody !

I have two indexes with the following mapping :

"skills_metiers": {
                "type": "nested",
                "properties": {
                    "metier": {
                        "type": "keyword"
                    },
                    "experience": {
                        "type": "keyword"
                    }
                }
            }

I want to search skills_metiers from index 1 to index 2 :

def build_query(skills_metiers = None):

query = {
    "size": 30,
    "query": {
        "bool": {
            "should": [],
            "must": []
        }
    }
}

if skills_metiers:
    for metier in skills_metiers:
        query["query"]["bool"]["should"].append({
            "nested": {
                "path": "skills_metiers",
                "score_mode": "avg",
                "query": {
                    "bool": {
                        "must": {
                            "query_string": {
                                "query": "skills_metiers.metier:" + metier['metier'] + " AND skills_metiers.experience:" + metier['experience']
                            }
                        }
                    }
                }                    
            }
        })

return query

I search :
[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BI'},
{'experience': 'DEBUTANT', 'metier': 'BA'}]

and the results are :

6.7111297
[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'BI'}]

6.7111297
[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'BI'}]

4.9342995
[{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'BI'}]

3.7447155
[{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'CP'}]

1.9678854
[{'experience': 'DEBUTANT', 'metier': 'BA'}]

1.7768301
[{'experience': 'DEBUTANT', 'metier': 'CP'}]

1.7768301
[{'experience': 'DEBUTANT', 'metier': 'AMOA'},
{'experience': 'DEBUTANT', 'metier': 'CP'}]

I don't understand why neither results 3 and 4 are not equals nor results 5, 6 and 7...

Thanks :slight_smile:

I guess you are speaking about the score here?

May be you have 5 shards (default value)? Which will explain that.
Go either with one shard or use query then fetch option.

Yes I am speaking about score.

I have only 1 shard and 0 replica (for each index)

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem.

Here is my python script I use to do my queries.

    def build_query(skills_metiers = None):
    
    query = {
        "size": 30,
        "query": {
            "bool": {
                "should": [],
                "must": []
            }
        }
    }

    if skills_metiers:
        for metier in skills_metiers:
            query["query"]["bool"]["should"].append({
                "nested": {
                    "path": "skills_metiers",
                    "score_mode": "avg",
                    "query": {
                        "bool": {
                            "must": {
                                "query_string": {
                                    "query": "(skills_metiers.metier:" + metier['metier'] + ") AND (skills_metiers.experience:" + metier['experience'] + ")"
                                }
                            }
                        }
                    }                    
                }
            })

    
    return query


def search(es, document):
    
    try:    
        query = build_query(document['skills_metiers'])
        response = es.search(index = "index2", body = json.dumps(query))

        best_results = []
        for i in range(len(response['hits']['hits'])):
            best_results.append([response['hits']['hits'][i]['_source'],response['hits']['hits'][i]['_score']])
    
        return best_results
    
    except:
        pass


# Save results in a dataframe
documents = es.search(index = "index1", 
                    body = {
                        "size" : es.count(index = 'index1')["count"], 
                        "query": { "match_all": {} } 
                    })

rows = []
columns = ['id_index1', 'id_index2', 'score']
for document in documents['hits']['hits']:
    best_results = search(es, document['_source'])
    try:
        for result in best_results:
            row = [document['_source']['id'], result[0]['id'], result[1]]
            rows.append(row)
    except:
        pass

df_query = pd.DataFrame(rows, columns = columns)

My dataframe looks like that :

id_index1 | id_index2 | score
91196 | 90082 | 6.7111297
91196 | 90083 | 6.7111297
... | ... | ...
91196 | 90074 | 1.7768301

Then I write the field skills_metiers for :

  1. Document corresponding to id_index1 = 91196 :

[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BI'},
{'experience': 'DEBUTANT', 'metier': 'BA'}]

  1. Documents corresponding to id_index2

Yeah but I can't reproduce anything with that script. So please provide a pure REST script that anyone can copy and paste in Kibana console to reproduce your problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.