Query results non understood


#1

Hi everybody !

I have two indexes with the following mapping :

"skills_metiers": {
                "type": "nested",
                "properties": {
                    "metier": {
                        "type": "keyword"
                    },
                    "experience": {
                        "type": "keyword"
                    }
                }
            }

I want to search skills_metiers from index 1 to index 2 :

def build_query(skills_metiers = None):

query = {
    "size": 30,
    "query": {
        "bool": {
            "should": [],
            "must": []
        }
    }
}

if skills_metiers:
    for metier in skills_metiers:
        query["query"]["bool"]["should"].append({
            "nested": {
                "path": "skills_metiers",
                "score_mode": "avg",
                "query": {
                    "bool": {
                        "must": {
                            "query_string": {
                                "query": "skills_metiers.metier:" + metier['metier'] + " AND skills_metiers.experience:" + metier['experience']
                            }
                        }
                    }
                }                    
            }
        })

return query

I search :
[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BI'},
{'experience': 'DEBUTANT', 'metier': 'BA'}]

and the results are :

6.7111297
[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'BI'}]

6.7111297
[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'BI'}]

4.9342995
[{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'BI'}]

3.7447155
[{'experience': 'DEBUTANT', 'metier': 'BA'},
{'experience': 'DEBUTANT', 'metier': 'CP'}]

1.9678854
[{'experience': 'DEBUTANT', 'metier': 'BA'}]

1.7768301
[{'experience': 'DEBUTANT', 'metier': 'CP'}]

1.7768301
[{'experience': 'DEBUTANT', 'metier': 'AMOA'},
{'experience': 'DEBUTANT', 'metier': 'CP'}]

I don't understand why neither results 3 and 4 are not equals nor results 5, 6 and 7...

Thanks :slight_smile:


(David Pilato) #2

I guess you are speaking about the score here?

May be you have 5 shards (default value)? Which will explain that.
Go either with one shard or use query then fetch option.


#3

Yes I am speaking about score.

I have only 1 shard and 0 replica (for each index)


(David Pilato) #4

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem.


#5

Here is my python script I use to do my queries.

    def build_query(skills_metiers = None):
    
    query = {
        "size": 30,
        "query": {
            "bool": {
                "should": [],
                "must": []
            }
        }
    }

    if skills_metiers:
        for metier in skills_metiers:
            query["query"]["bool"]["should"].append({
                "nested": {
                    "path": "skills_metiers",
                    "score_mode": "avg",
                    "query": {
                        "bool": {
                            "must": {
                                "query_string": {
                                    "query": "(skills_metiers.metier:" + metier['metier'] + ") AND (skills_metiers.experience:" + metier['experience'] + ")"
                                }
                            }
                        }
                    }                    
                }
            })

    
    return query


def search(es, document):
    
    try:    
        query = build_query(document['skills_metiers'])
        response = es.search(index = "index2", body = json.dumps(query))

        best_results = []
        for i in range(len(response['hits']['hits'])):
            best_results.append([response['hits']['hits'][i]['_source'],response['hits']['hits'][i]['_score']])
    
        return best_results
    
    except:
        pass


# Save results in a dataframe
documents = es.search(index = "index1", 
                    body = {
                        "size" : es.count(index = 'index1')["count"], 
                        "query": { "match_all": {} } 
                    })

rows = []
columns = ['id_index1', 'id_index2', 'score']
for document in documents['hits']['hits']:
    best_results = search(es, document['_source'])
    try:
        for result in best_results:
            row = [document['_source']['id'], result[0]['id'], result[1]]
            rows.append(row)
    except:
        pass

df_query = pd.DataFrame(rows, columns = columns)

My dataframe looks like that :

id_index1 | id_index2 | score
91196 | 90082 | 6.7111297
91196 | 90083 | 6.7111297
... | ... | ...
91196 | 90074 | 1.7768301

Then I write the field skills_metiers for :

  1. Document corresponding to id_index1 = 91196 :

[{'experience': 'DEBUTANT', 'metier': 'CP'},
{'experience': 'DEBUTANT', 'metier': 'BI'},
{'experience': 'DEBUTANT', 'metier': 'BA'}]

  1. Documents corresponding to id_index2

(David Pilato) #6

Yeah but I can't reproduce anything with that script. So please provide a pure REST script that anyone can copy and paste in Kibana console to reproduce your problem.


(system) closed #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.