i have multiple indexes with multiple documents grouped in a data view.
For documents like {..., field1: value, field2: value, ...}, is there a way to get all documents for which at least one other document exists with the same value in field1, but a different value in field2?
Maybe as REST call, or as Discoverer Search, or in Kibana Dashboard Query DSL?
I just need to get this information once and thats it, but i don't know how to approach it and which part of the stack is best to use for it.
If you have a data view (myDataView) with multiple indices connected it (indexA, indexB etc), you can use Discover section Kibana Query Language KQL to filter it like
field1:value AND (field2:value1 OR field2:value2 OR ...)
Or you can click Add Filter on top left and enter necessary filters.
Hello, thanks for the response but that's not what i'm looking for.
This suggests that i know what "value" is, but i want to get all documents with some value in field1, which i don't want to specify, for which at least one other document exists with the same value in field1, but a different one in field2.
So a filter is not sufficient i think, because it always looks on a single document. But to get what i want you need to look at all documents.
What you are describing sounds a lot like a join, which Elasticsearch does not support. I do not believe you can do this in a single request so you may need to handle it in the application using multiple requests.
Maybe it helpts if i add more context: as i said i have a data view with multiple indexes. Each index has data from a different database. One document is information about a company and field1 is a global identifier for that company and field2 is a local/database specific identifier. Each index can contain multiple documents from the same company.
And now i want to know how much and which companies exist in at least two databases/indexes.
Which is the case if the document has the same global identifier, but a different local identifier.
As it looks like Elasticsearch can not answer this question i exported the data to csv and got my result with a short python script. If Elasticsearch can do it and we just don't know how, please let me know.
If anyone has a similar problem:
import csv
data = {}
with open('csv.csv', mode='r') as csv_file:
csv_reader = csv.reader(csv_file)
for row in csv_reader:
value1 = row[0]
value2 = row[1]
if value1 not in data:
data[value1] = [value2]
else:
if value2 not in data[value1]:
data[value1].append(value2)
with open('output.txt', mode='w') as output_file:
for value1, values in data.items():
if len(values) >= 2:
output_file.write(f"ID: {value1}\n")
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.