Get documents based on other documents

Jonas_S · October 5, 2023, 11:56am

Hello,

i have multiple indexes with multiple documents grouped in a data view.

For documents like {..., field1: value, field2: value, ...}, is there a way to get all documents for which at least one other document exists with the same value in field1, but a different value in field2?

Maybe as REST call, or as Discoverer Search, or in Kibana Dashboard Query DSL?

I just need to get this information once and thats it, but i don't know how to approach it and which part of the stack is best to use for it.

PodarcisMuralis · October 5, 2023, 8:35pm

If you have a data view (myDataView) with multiple indices connected it (indexA, indexB etc), you can use Discover section Kibana Query Language KQL to filter it like

field1:value AND (field2:value1 OR field2:value2 OR ...)

Or you can click Add Filter on top left and enter necessary filters.

Jonas_S · October 10, 2023, 2:21pm

Hello, thanks for the response but that's not what i'm looking for.

This suggests that i know what "value" is, but i want to get all documents with some value in field1, which i don't want to specify, for which at least one other document exists with the same value in field1, but a different one in field2.

So a filter is not sufficient i think, because it always looks on a single document. But to get what i want you need to look at all documents.

Christian_Dahlqvist · October 10, 2023, 2:28pm

What you are describing sounds a lot like a join, which Elasticsearch does not support. I do not believe you can do this in a single request so you may need to handle it in the application using multiple requests.

Jonas_S · October 10, 2023, 2:29pm

Maybe it helpts if i add more context: as i said i have a data view with multiple indexes. Each index has data from a different database. One document is information about a company and field1 is a global identifier for that company and field2 is a local/database specific identifier. Each index can contain multiple documents from the same company.
And now i want to know how much and which companies exist in at least two databases/indexes.
Which is the case if the document has the same global identifier, but a different local identifier.

And Elasticsearch can not answer that question?

Jonas_S · October 10, 2023, 3:13pm

As it looks like Elasticsearch can not answer this question i exported the data to csv and got my result with a short python script. If Elasticsearch can do it and we just don't know how, please let me know.

If anyone has a similar problem:

import csv

data = {}

with open('csv.csv', mode='r') as csv_file:
    csv_reader = csv.reader(csv_file)
    for row in csv_reader:
        value1 = row[0]
        value2 = row[1]
        if value1 not in data:
            data[value1] = [value2]
        else:
            if value2 not in data[value1]:
                data[value1].append(value2)

with open('output.txt', mode='w') as output_file:
    for value1, values in data.items():
        if len(values) >= 2:
            output_file.write(f"ID: {value1}\n")

system · November 7, 2023, 3:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Compare fields of different documents in the same index Kibana	5	3747	March 19, 2020
Merge Documents based on field value? Elasticsearch	2	6102	September 25, 2017
Find and show values of a field, which is also in another field Kibana	5	2073	October 25, 2019
Searching for data over multiple indexes Kibana	6	1611	July 13, 2020
Display all documents with duplicated value of a given field Kibana	10	4430	June 6, 2019

Get documents based on other documents

Related topics