Distinct search using Java API

Inquiry with Java API.

Is there any method to configure a search for distinct data?

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
            .should(QueryBuilders.matchQuery("name", "Kim"))
            .should(QueryBuilders.matchQuery("name", "Lee"));
    searchSourceBuilder.query(queryBuilder);

    searchSourceBuilder.fetchSource(true);
    searchSourceBuilder.fetchSource(include, exclude);
    searchSourceBuilder.size(10); 
    searchRequest.indices("test_index");
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    String scrollId = searchResponse.getScrollId();
    SearchHit[] searchHits = searchResponse.getHits().getHits();

I also tried adding this code

searchSourceBuilder.aggregation(AggregationBuilders.cardinality("distinct").field("name"));

which returns

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]

Welcome!

What is the exact and complete error message?

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

[quote="dadoonet, post:4, topic:272978"]

CODE

16:36:48.983 [Test worker] ERROR org.springframework.batch.core.step.AbstractStep - Encountered an error executing step competitorsGapStep in job competitorsGapJob
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=search_phase_execution_exception, reason=all shards failed]
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-7.4.0.jar:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1727) ~[elasticsearch-rest-high-level-client-7.4.0.jar:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1704) ~[elasticsearch-rest-high-level-client-7.4.0.jar:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1467) ~[elasticsearch-rest-high-level-client-7.4.0.jar:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1424) ~[elasticsearch-rest-high-level-client-7.4.0.jar:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1394) ~[elasticsearch-rest-high-level-client-7.4.0.jar:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:930) ~[elasticsearch-rest-high-level-client-7.4.0.jar:7.4.0]
	at econ.batch.jobs.reverseEnginePage.CompetitorsGapTasklet.searchESRepData(CompetitorsGapTasklet.java:208) ~[main/:?]
	at econ.batch.jobs.reverseEnginePage.CompetitorsGapTasklet.execute(CompetitorsGapTasklet.java:117) ~[main/:?]
	at econ.batch.jobs.reverseEnginePage.CompetitorsGapTasklet$$FastClassBySpringCGLIB$$3c542299.invoke(<generated>) ~[main/:?]
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721) ~[spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) ~[spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:133) ~[spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:121) ~[spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) ~[spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656) ~[spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at econ.batch.jobs.reverseEnginePage.CompetitorsGapTasklet$$EnhancerBySpringCGLIB$$8a05a335.execute(<generated>) ~[main/:?]
	at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:406) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133) ~[spring-tx-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:271) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:81) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:374) ~[spring-batch-infrastructure-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215) ~[spring-batch-infrastructure-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144) ~[spring-batch-infrastructure-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:257) ~[spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:200) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:148) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.job.AbstractJob.handleStep(AbstractJob.java:392) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.job.SimpleJob.doExecute(SimpleJob.java:135) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:306) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:135) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50) [spring-core-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:128) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_265]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_265]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_265]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_265]
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333) [spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:190) [spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) [spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.batch.core.configuration.annotation.SimpleBatchConfiguration$PassthruAdvice.invoke(SimpleBatchConfiguration.java:127) [spring-batch-core-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) [spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213) [spring-aop-4.3.7.RELEASE.jar:4.3.7.RELEASE]
	at com.sun.proxy.$Proxy47.run(Unknown Source) [?:?]
	at org.springframework.batch.test.JobLauncherTestUtils.launchJob(JobLauncherTestUtils.java:152) [spring-batch-test-3.0.6.RELEASE.jar:3.0.6.RELEASE]
	at econ.batch.jobs.CompetitorsGapTest.test(CompetitorsGapTest.java:58) [test/:?]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_265]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_265]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_265]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_265]
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
	at 
	Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://10.118.242.191:10250], URI [/kmw_should_test/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&scroll=30m&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"kmw_should_test","node":"tUEhW5NBQs2AUT9rR6tXgQ","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.","caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}},"status":400}
		at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:253) ~[elasticsearch-rest-client-7.4.0.jar:7.4.0]
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:231) ~[elasticsearch-rest-client-7.4.0.jar:7.4.0]`Preformatted text`
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.]
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:491) ~[elasticsearch-7.4.0.jar:7.4.0]
	at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:402) ~[elasticsearch-7.4.0.jar:7.4.0]
	at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:432) ~[elasticsearch-7.4.0.jar:7.4.0]

Sorry for the mess.. this site allowed only 13000 characters per reply. I filtered the essential messages from the second reply.

The important part is:

Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.

So you need to use a keyword field. If you are using the default mapping you can try;

searchSourceBuilder.aggregation(AggregationBuilders.cardinality("distinct").field("name.keyword"));

Thanks the error is resolved, however the output for my search still returns duplicates for the field "name"

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case. It will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

If you are looking at:

searchResponse.getHits().getHits();

That's expected.
You need to get back the aggregation distinct from the response.

To simplify my example, this is my index:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "kmw_should_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "Kim",
          "grade" : "A",
          "age" : 1
        }
      },
      {
        "_index" : "kmw_should_test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "Kim",
          "grade" : "B",
          "age" : 2
        }
      },
      {
        "_index" : "kmw_should_test",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "Lee",
          "grade" : "B",
          "age" : 3
        }
      }
    ]
  }
}

I intend to get the hits without duplicates in name.keyword with my Java API

: Kim and Lee only one each. (I would also like to know if I can choose between duplicates)

In case you missed it.

A full reproduction script is something anyone can copy and paste in Kibana dev console, click on the run button to reproduce your use case.

Thanks

I am terribly sorry, I do not know the full reproduction script in Kibana for my request.

I am lost with how to get the distinct search for my example

It's easy as documented in About the Elasticsearch category.

DELETE index
PUT index/_doc/1
{
  "foo": "bar"
}
GET index/_search
{
  "query": {
    "match": {
      "foo": "bar"
    }
  }
}

But just with your mapping, data and then with the Cardinality aggregation | Elasticsearch Guide [7.12] | Elastic. In that link you also have some "Kibana" examples.

PUT index/_doc/1
{
  "foo": "bar"
}
PUT index/_doc/2
{
  "foo": "bar"
}
GET index/_search
{
  "query": {
    "match": {
      "foo": "bar"
    }
  }
}

this is give me 2 responses, whereas only 1 distinct result is needed for me...
I don't mean that I want a request with size 1 neither.
I need a code that will search with the duplicates excluded.

This

DELETE index 
PUT index/_doc/1
{
  "foo": "bar"
}
PUT index/_doc/2
{
  "foo": "bar"
}
GET index/_search
{
  "size": 0, 
  "aggs": {
    "top10foo": {
      "terms": {
        "field": "foo.keyword",
        "size": 10
      }
    }
  }
}

Gives:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "top10foo" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "bar",
          "doc_count" : 2
        }
      ]
    }
  }
}

Yes! now back to my first question... how can I apply this aggregation in Java API...?

To be specific I tried :

  final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(30L));
  SearchRequest searchRequest = new SearchRequest();
  searchRequest.scroll(scroll);
  SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
  BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery()
        .should(QueryBuilders.matchQuery("name", "Kim"))
        .should(QueryBuilders.matchQuery("name", "Lee"));
  searchSourceBuilder.query(queryBuilder);
  searchSourceBuilder.size(0);

  searchSourceBuilder.aggregation(AggregationBuilders.terms("term_name").field("name.keyword").size(100));
 searchRequest.indices("index");
  searchRequest.source(searchSourceBuilder);
  SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
   String scrollId = searchResponse.getScrollId();
   SearchHit[] searchHits = searchResponse.getHits().getHits();
           while (searchHits != null && searchHits.length > 0) {
              for (int i = 0; i < 1; i++) {
                        //get source each line
              }
    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
    scrollRequest.scroll(scroll);
    searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
    scrollId = searchResponse.getScrollId();
    searchHits = searchResponse.getHits().getHits();

}

However, with size 0 in my basic search, the scroll returns an error...

So I converted the Kibana code to Java code.

It gives:

10:14:32,597 INFO  [f.p.t.e.h.EsClientTest] top10foo bucket = bar, count = 2

Thank you very much! Is there any way for me to help your reputation?

1 Like