Document similarity problems


(jiangshigen) #1

I do not understand how to use searchTemplates in the Java API of Elasticsearch correctly. My template seems to work fine when I test it in sense. But when I use the template in Java code, it gives different results.
Here is what I do

DELETE /megacorp
PUT /megacorp/employee/1
{
    "first_name":"John",
    "last_name":"Smith",
    "sex":"male","age":25,
    "about":"I love to go rock climbing"
 }
GET /megacorp/_search  
{
   "query":{
         "bool":{
               "must":[
                    {"term":{"sex":"male"}}
                ]
            }
      }
 }

This returns:

   {
       "took": 5,
       "timed_out": false,
       "_shards": {
            "total": 5,
            "successful": 5,
            "failed": 0
       },
     "hits": {
            "total": 1,
            "max_score": 0.30685282,
            "hits": [
                      {
                       "_index": "megacorp",
                       "_type": "employee",
                       "_id": "AWEnfZtfSGB1cPZqeBGA",
                       "_score": 0.30685282,
                       "_source": {
                       "first_name": "John",
                       "last_name": "Smith",
                       "sex": "male",
                       "age": 25,
                       "about": "I love to go rock climbing"
                      }
                  }
              ]
         }
    }

So that looks nice: a score of 0.30685282. But now comes the problem. When I want to use this searchTemplate in my Java code, the score is 1.0 .
Here is my code:

public static void main(String[] args) {
		RetrievalPatients rp = new RetrievalPatients();
		Map<String, Object> template_params = new HashMap<>();
		template_params.put("sex", "male");
		Client client = rp.getEsClient();
		SearchResponse response = client.prepareSearch("megacorp").setTypes("employee")
				.setSearchType(SearchType.QUERY_THEN_FETCH)
				.setQuery(QueryBuilders.templateQuery("template_gender",ScriptService.ScriptType.FILE,template_params)).get();
		System.out.println(response.toString());
		client.close();
	}

template_gender template content

 {
       "query":{
             "bool":{
                   "must":[
                        {"term":{"sex":"{{sex}}"}}
                    ]
                }
          }
     }

It returns:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "AWEnfZtfSGB1cPZqeBGA",
      "_score" : 1.0,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "sex" : "male",
        "age" : 25,
        "about" : "I love to go rock climbing"
      }
    } ]
  }
}

So why is my score 1.0 now instead of 0.30685282?


(Simon Willnauer) #2
  • is it possible that you have more than one document in your index when you run the search with your java code?
  • did you try a plain boolean query instead of the template in the java code and does that return the expected score?
  • can you run a match all query in java and paste the result

(jiangshigen) #3

Thanks for you answer.
I tried a plain boolean query instead of the template in the java code and that return the expected score.

    public static void main(String[] args) {
		RetrievalPatients rp = new RetrievalPatients();
//		Map<String, Object> template_params = new HashMap<>();
//		template_params.put("sex", "male");
		Client client = rp.getEsClient();
		SearchResponse response = client.prepareSearch("megacorp").setTypes("employee")
				.setSearchType(SearchType.QUERY_THEN_FETCH)
				.setQuery(QueryBuilders.boolQuery().must(QueryBuilders.termQuery("sex", "male"))).get();
//				.setQuery(QueryBuilders.templateQuery("template_gender",ScriptService.ScriptType.FILE,template_params)).get();
		System.out.println(response.toString());
		client.close();
	}

It returns:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "AWExQtQGQ77PsUZeEPFP",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "sex" : "male",
        "age" : 25,
        "about" : "I love to go rock climbing"
      }
    }, {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "AWExQtQGQ77PsUZeEPFQ",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "Tom",
        "last_name" : "Jack",
        "sex" : "male",
        "age" : 40,
        "about" : "I rock love to go climbing"
      }
    } ]
  }
}

but,I want to konw the reason that use this templateQuery in my Java code didn't return the expected score. Have a another way to execute the JSON template in Java code?


(Simon Willnauer) #4

wait, now you return 2 documents. Can you delete the index before you execute you java queries and then add a single doc to see if it works. the score will change with the number of documents matching the query.


(jiangshigen) #5

Thanks you very much.
I delete the index before i execute my java queries and then add a single doc to the index.I executed the same java code.

public static void main(String[] args) {
		RetrievalPatients rp = new RetrievalPatients();
//		Map<String, Object> template_params = new HashMap<>();
//		template_params.put("sex", "male");
		Client client = rp.getEsClient();
		SearchResponse response = client.prepareSearch("megacorp").setTypes("employee")
				.setSearchType(SearchType.QUERY_THEN_FETCH)
				.setQuery(QueryBuilders.boolQuery().must(QueryBuilders.termQuery("sex", "male"))).get();
//				.setQuery(QueryBuilders.templateQuery("template_gender",ScriptService.ScriptType.FILE,template_params)).get();
		System.out.println(response.toString());
		client.close();
	}

It returns:

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.30685282,
    "hits" : [ {
      "_index" : "megacorp",
      "_type" : "employee",
      "_id" : "AWFB4Cw2jcVnkvXr8cN6",
      "_score" : 0.30685282,
      "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "sex" : "male",
        "age" : 25,
        "about" : "I love to go rock climbing"
      }
    } ]
  }
}

(Simon Willnauer) #6

ok cool, can you post the response from _explain for both variants?


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.