Total Hits in elastic Search prepareSearch() behaves in an inconsistent manner

I'm new to ElasticSearch and my scenario is something like, there is an index with some 100 documents. I'm using following snippet to retrieve those

ListenableActionFuture f = esClient.prepareSearch(index) .setTypes(type) .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) .setQuery(multiMatchQuery) .execute();

I want to fetch all documents at one shot without using scroll. I know that index wont be having so many records. If I specify

ListenableActionFuture f = esClient.prepareSearch(index) .setTypes(type) .setSearchType(SearchType.DFS_QUERY_THEN_FETCH) .setQuery(multiMatchQuery) .setSize(100) .execute(); behaviour is inconsistent. Number of records getting fetched varies every time, sometimes 34 documents, sometimes 21 etc.

But if I dont specify size, its fetching 10 always. I'm using elasticsearch 5.6.3.

I have tried few solutions such as using without search type as below,but no luck

ListenableActionFuture f = esClient.prepareSearch(index) .setTypes(type) .setQuery(multiMatchQuery) .setSize(hitcount) .execute();

I need a query to fetch all records at one shot. When I try to print searchresponse,I found total hits is incorrect as below

hits": { "total": 25, "max_score": 0.18232156, "hits": [ { .... } }} It should be 100. Again if I execute, same code, I'm getting different total hits.

My index is having 5 shards

I'm wondering what makes me getting inconsistent total hits often.

Just set size to 10000.
BTW you probably need only one shard.

I tried to set 1000, but now the total hits roaming around 35, sometimes 36, 34, etc
My total match hit size should be 73.

My updated code

ListenableActionFuture f = esClient.prepareSearch(index)
.setTypes(type)
.setQuery(multiMatchQuery)
.setSize(10000)
.execute();

"BTW you probably need only one shard.",

which means in case of multiple shards, how to read all matching records at one shot?

Could you share the exact REST call and result and the exact Java code (ie what is the query, ...)?

I just meant that if you have not that much data, one shard should be enough.

Here is the complete code

			String[] stringArray= new String[1];
			stringArray[0]="status";
			MultiMatchQueryBuilder multiMatchQuery = QueryBuilders.multiMatchQuery("running",stringArray);
			ListenableActionFuture<SearchResponse> f = esClient.getESClient().prepareSearch(index)
            .setTypes(type)
            .setQuery(multiMatchQuery)
            .setSize(10000)
            .execute();

I may have max 5000 records in this index. In that case having single shard will have any performance or any other issue? If no, how to specify number of shards in java api while creating index?

In case I prefer to use 5 shards, is there no way to get all records at one shot?

Could you share the exact REST call and result?

I need that as well.

I may have max 5000 records in this index. In that case having single shard will have any performance or any other issue?

That will be definitely better with one single shard. Faster.
You define the number of shards when creating the index by providing index settings.

Example here: Create Index API | Java REST Client [7.17] | Elastic

In case I prefer to use 5 shards, is there no way to get all records at one shot?

This is not related to the problem you exposed.

I'm invoking SearchRequestBuilder through Java API to fetch results.I have written only below code, where esclient.getESClient() will return me
org.elasticsearch.client.Client and I will append the required parameters to that.

ListenableActionFuture f = esClient.getESClient().prepareSearch(index)
.setTypes(type)
.setQuery(multiMatchQuery)
.setSize(10000)
.execute();

ListenableActionFuture f is having below result

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0.2876821,
"hits": [
{
"_index": "reindexjob",
"_type": "scrolljob",
"id": "AWZCfKD-3sx8bE5KQlR",
"_score": 0.2876821,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "realtestindex",
"typename": "health"
}
},
{
"_index": "reindexjob",
"_type": "scrolljob",
"_id": "AWZCfKED3sx8bE5KQlSA",
"_score": 0.2876821,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "realtestindex",
"typename": "sessions"
}
},
{
"_index": "reindexjob",
"_type": "scrolljob",
"_id": "AWZCfKFE3sx8bE5KQlSC",
"_score": 0.2876821,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "testindex",
"typename": "a071971a-8645-4abf-b367-e4eeadd777e6"
}
},
{
"_index": "reindexjob",
"_type": "scrolljob",
"_id": "AWZCfKFI3sx8bE5KQlSE",
"_score": 0.2876821,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "testindex",
"typename": "19ae3b6b-cad2-40d4-84c3-c23b40edca87"
}
},
{
"_index": "reindexjob",
"_type": "scrolljob",
"_id": "AWZCfKDt3sx8bE5KQlR-",
"_score": 0.13353139,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "realtestindex",
"typename": "events"
}
},
{
"_index": "reindexjob",
"_type": "scrolljob",
"_id": "AWZCfKFC3sx8bE5KQlSB",
"_score": 0.13353139,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "testindex",
"typename": "46d75306-a891-43b2-b80a-760749f1fb3c"
}
},
{
"_index": "reindexjob",
"_type": "scrolljob",
"_id": "AWZCfKFF3sx8bE5KQlSD",
"_score": 0.13353139,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "testindex",
"typename": "564815a7-56c1-41b7-a3a5-cfc07b453fdc"
}
}
]
}
}

Please let me know if you need more details

I changed my index to have single shard

{"reindexjob":{"settings":{"index":{"creation_date":"1538717770126","number_of_shards":"1","number_of_replicas":"1","uuid":"cNYWZtkCRCeNv_-8jM124A","version":{"created":"5060399"},"provided_name":"reindexjob"}}}}

But still total hits are inconsistent. When I query the count via rest call
curl -XGET 'localhost:9200/reindexjob/_count' -d '{"query" : {"match" : {"status":"running"}}}'

I'm getting 73 as result.
{"count":73,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}},

same when I'm doing via java call

SearchRequest{searchType=QUERY_THEN_FETCH, indices=[reindexjob], indicesOptions=IndicesOptions[id=38, ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_alisases_to_multiple_indices=true, forbid_closed_indices=true], types=[scrolljob], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, source={
"size" : 10000,
"query" : {
"multi_match" : {
"query" : "running",
"fields" : [
"status^1.0"
],
"type" : "best_fields",
"operator" : "OR",
"slop" : 0,
"prefix_length" : 0,
"max_expansions" : 50,
"lenient" : false,
"zero_terms_query" : "NONE",
"boost" : 1.0
}
}
}}

I'm getting inconsistent responses as above.

Can you try the exact same query?
Either match on both tests or multimatch?

I tried to make make curl call sync with java API at the maximum as below

curl -XGET 'localhost:9200/reindexjob/_search?search_type=query_then_fetch&size=10000' -d '{"query" : {"multi_match" : {"query":"running","fields":["status"],"type":"best_fields", "operator" : "OR", "slop" : 0, "prefix_length" : 0, "max_expansions" : 50, "lenient" : false, "zero_terms_query" : "NONE", "boost" : 1.0}}}'

result is 73 always as below

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 73,
"max_score": 0.006779687,
"hits": [
{
"_index": "reindexjob",
"_type": "scrolljob",
"_id": "AWZC9ULQ3sx8bE5KQlue",
"_score": 0.006779687,
"_source": {
"scrollid": null,
"status": "running",
"indexname": "testindex",
"typename": "69625b89-5095-4964-94f8-ae9a60025653"
}
},
....... here 72 more records .....
}
]
}
}

Additional details, if I do remote debugging on this java call , then my response is having 73 records in java layer always ..

Can you please provide any update on this?

Hi @RAM_NATHAN,

need the exact count of specified fields values means can you use match_phrase query with field names

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match_phrase" : {
"message" : "this is a test"
}
}
}
'

Thanks for the update. But using match_phrase, will solve my problem?
Should MultiMatchQueryBuilder not be used, in case of getting count?

Is this way match_phrase should be used?

QueryBuilder queryBuilder = QueryBuilders.matchPhraseQuery("_all", "Test Type Search");

SearchResponse response = client.prepareSearch("index_name")
.setQuery(queryBuilder)
.setFrom(1)
.setSize(10)
.execute()
.actionGet();

Hi @RAM_NATHAN,
You need to get the total records and count means please use matchAllQuery ,
need to get one field value (like status ="running") document count means use matchphrasequery ("fieldname","value");

QueryBuilder queryBuilder = QueryBuilders.matchPhraseQuery("_all", "Test Type Search");

SearchResponse response = client.prepareSearch("index_name")
.setQuery(queryBuilder)
.setFrom(1)
.setSize(10)
.execute()
.actionGet();

MatchPhrase too cant help. Again inconsistent results some times 28, sometimes 10.
please find the response below.

{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 22,
"max_score": 0.021978907,
"hits": [..........]
}
}

I'm using reactive pattern. Does it make any difference?

Observable.create(new OnSubscribe<List<InnerSearchHits>>() {
@Override
public void call(Subscriber<? super List<InnerSearchHits>> t) {

			QueryBuilder queryBuilder = QueryBuilders.matchPhraseQuery(filterKey,filterValue);
			ListenableActionFuture<SearchResponse> f = esClient.getESClient().prepareSearch(index)
            .setQuery(queryBuilder).
            setFrom(0)
            .setSize(10000)
            .execute();
			Observable.from(f).map(x -> {
				List<InnerSearchHits<E>> innerSearchHits = new ArrayList<InnerSearchHits<E>>();
    			SearchHits hits = x.getHits();
    		
    			for (SearchHit searchHit : hits) {
    		.............
				}
                return innerSearchHits;
			}).onErrorReturn(e -> {      			
				......
        		return null;
			}).subscribe(...
				}
			}, e -> { 
				if (!t.isUnsubscribed()) {
					t.onError(e);
				}
			}, () -> {
				if (!t.isUnsubscribed()) {
					t.onCompleted();
				}
			});
			
		}
	});

Same code is working fine, if I do remote debugging

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.