How to retrieve the content of Elasticsearch reverse indexes?

fmind · August 2, 2018, 12:04pm

Hello,

I want to run an analysis on Elasticsearch to retrieve the content of its reverse indexes.

I need this information to know the list of document ID associated to combination of field/value stored in the reverse index:

For instance, given these 4 documents in my cluster:
{'_id': 1, 'os': 'linux', 'lang': 'python'}
{'_id': 2, 'os': 'linux', 'lang': 'perl'}
{'_id': 3, 'os': 'mac', 'lang': 'ruby'}
{'_id': 4, 'os': 'bsd 'lang': 'python'}

I want to return the following results, where '_ids' contains the list of document id:
{'os': 'linux', '_ids': [1, 2]}
{'os': 'mac', '_ids': [3]}
{'os': 'bsd', '_ids': [4]}
{'lang': 'python', '_ids': [1, 4]}
{'lang': 'ruby', '_ids': [3]}
{'lang': 'perl', '_ids': [2]}

I tested the Composite Aggregation API, but I was only able to return the document count and not the full list of document id:
{'os': 'linux', 'docs': 2}
{'os': 'mac, 'docs': 1}
{'os': 'bsd, 'docs': 1}
{'lang': 'python', docs: 2}
{'lang': 'ruby', 'docs': 1}
{'lang': 'perl', 'docs': 1}

At this point, my options are either to use Elasticsearch Hadoop or to migrate my data to Hadoop directly (the index contains more than 6 million document, and takes 1.7 TB). I could also run a single query per field / value, but that would be really inefficient (in my previous example, that would be 6 queries for 4 documents).

Do you know an Elasticsearch API I can use to extract this information ?

Is there a more lightweight alternative to using Hadoop for this case ?

Thank you for your help

fmind · August 6, 2018, 8:31am

bump

dadoonet · August 6, 2018, 9:05am

May be a terms aggregation then a top hits inner aggregation would work for your case?

system · September 3, 2018, 9:17am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to extract the content of elasticsearch indexes? Elasticsearch	4	580	April 11, 2018
How to get all document _id of an elasticsearch index Elasticsearch	2	1670	July 6, 2017
Extract _id's of matching documents Elasticsearch	3	720	January 7, 2019
How can I get the document meta data after saving a document in elasticsearch? Elasticsearch	6	331	November 8, 2023
Document exist check Elasticsearch	5	2956	July 5, 2017

How to retrieve the content of Elasticsearch reverse indexes?

Related topics