How to create “distinct” query in elasticsearch java api, like we do in sql. here is my query, which kibana generated, so I should do the same with java, and I'm trying many ways, but again getting with duplicates.
@Christian_Dahlqvist comming back to your question. this whole query retrieves rows, where doc_id is not uniqe, so I want it to be uniq, like dinstinct, to get the ones, where this doc_id field doesn't repeat.
Actually these two methods does the job, what I need, the only lack here is that I get the rows, where doc_id field values repeat, so this is the problem actually now, how to do so that this field values be distinct.
public BucketList getListOfBucketsTimeRestricted(BucketListInfo bucketListInfo) {
final SearchRequest searchRequest = new SearchRequest(bucketListInfo.getIndexName());
final SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder()
.query(getRangeQueryBuilderWithOptionalFilter(bucketListInfo.getTimestampFieldName(), bucketListInfo.getFilterFieldName(), bucketListInfo.getFilterFieldValue(),
bucketListInfo.getFrom(), bucketListInfo.getTo())).size(100);
for (String aggrField : bucketListInfo.getAggrFieldList()) {
final TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms(aggrField).field(aggrField);
searchSourceBuilder.aggregation(aggregationBuilder);
}
searchRequest.source(searchSourceBuilder);
try {
final SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
BucketList bucketList = new BucketList();
for (String aggrField : bucketListInfo.getAggrFieldList()) {
final Terms terms = response.getAggregations().get(aggrField);
bucketList.getBuckets().add(terms.getBuckets());
}
return bucketList;
} catch (Exception e) {
log.error(e.getMessage(), e);
return null;
}
}
private static final List<String> AGGR_FIELDS = List.of("policy.name.keyword", "policy.description.keyword");
private static final String IS_COMPLIANT_FIELD = "is_compliant";
private final SearchClient searchClient;
public List<PolicyViolation> getEvaluationTimeRestricted(DateTime from, DateTime to) {
final BucketListInfo bucketListInfo = BucketListInfo.builder()
.indexName(EVALUATION.getIndexName())
.timestampFieldName(EVALUATION.getTimestampFieldName())
.aggrFieldList(AGGR_FIELDS)
.filterFieldName(IS_COMPLIANT_FIELD)
.filterFieldValue(false)
.from(from)
.to(to)
.build();
BucketList buckets = searchClient.getListOfBucketsTimeRestricted(bucketListInfo);
if (Objects.isNull(buckets)) {
return Collections.emptyList();
}
List<? extends Terms.Bucket> firstBuckets = buckets.getBuckets().get(0);
List<? extends Terms.Bucket> secondBuckets = buckets.getBuckets().get(1);
final List<PolicyViolation> evaluationCounts = new ArrayList<>();
for (int i = 0; i < firstBuckets.size(); i++) {
evaluationCounts.add(
new PolicyViolation(firstBuckets.get(i).getKeyAsString(), secondBuckets.get(i).getKeyAsString(), firstBuckets.get(i).getDocCount()));
}
return evaluationCounts;
}
@Christian_Dahlqvist sorry, but I think we don't understand each other.
I want to do a "select distinct" query analog in elasticsearch java api, and then group by the results, got from that select distinct query. just that.
@stephenb I started using the sql search api, you suggested, but the problem is now that my index name is "*-evaluation", and when I'm using the query like this :
so I get a very unclear json object graph, which I think is not related to the data, which I am expecting. I even excape the index name in string, like '*-evaluation', but anyway, again no exected results. So any idea how to resolve this?
I want to do a "select distinct" query analog in elasticsearch java api, and then group by the results, got from that select distinct query. just that.
I do not think there is any query construct that based on a filter returns all unique values (terms) from a field. The closest I think you can get is to perform a terms aggregation, but that will give you a count together with each term. There is also a limit to the size of the result set, so you may not get all if there are many values.
You can do nested aggregation, but I'm not sure if that's what you are looking for.
You can group by unique doc_id and within each unique doc_id, you can further group by other fields.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.