Search across multiple indices for the same field

Ashish1 · November 4, 2019, 7:56am

Hello all,

I am working on an event based dataset where let's say there is a central customer index and multiple indexes for different events types, say index event_1, event_2, event_3.

I want to search across all the event_* indices and find out the customers common to each of them.

I tried a query with a bool should clause and terms aggregations, however I feel that the aggregations would be the limiting factor here.

Is there any other way by which we can query the event indexes ?

Eg:
Customer index:
{
"id":1
"Name": XYZ
}

Event_1:
{
"event1_val": ABC,
"id": 1
},
{
"event1_val": PQR,
"id":2
}

Event_2:
{
"event2_val": ,
"id": 1
},
{
"event2_val": PQR,
"id":2
}

Let's say i want to get all the customer ids which have event_1 value as ABC and event_2 value as PQR.

Currently i am creating it by this query:

{
  "size":0,
  "query":{
    "bool":{
      "should": [
        {
          "term": {
            "event1_val": {
              "value": "ABC"
            }
          }
        },
        {
          "term": {
            "event2_val": {
              "value": "PQR"
            }
          }
        }
      ]
    }
  }
  , "aggs": {
    "customers": {
      "terms": {
        "field": "id",
        "min_doc_count": 2
      }
    }
  }
}

Any other way by which I can find common ids among the indices ?

spinscale · November 4, 2019, 10:19am

Have you thought about modeling your data differently, so that you can answer such a question with a single query over a single index?

How about including all the customer data in each event so that all of your data is available in every event? It seems that could possibly simplify your queries a lot!

--Alex

Ashish1 · November 4, 2019, 10:22am

Thanks Alex for the suggestion.
The customer data is a huge JSON and hence would be really expensive to duplicate them for each of the events. There are more than 2000 events per day appending customer data to each of them would create a huge dataset!
Any other modelling/querying technique would be really appreciated! thanks!

spinscale · November 4, 2019, 1:43pm

Hey,

you may want to check out the join datatype then.

--Alex

DmitriV · November 4, 2019, 4:31pm

I have a similar problem where I need to find ids common to two indices. Currently I solve it by querying ES for all ids in each index (using scrolls) and comparing them. For indices containing millions of documents my current implementation takes less than a minute and is reasonable in terms of RAM.

Here's the pseudocode (using map-reduce paradigm):

Get 10k elements from first index, sorted by the field I'm interested in
Get first and last field value from those 10k
Query second index for all documents whose values are between the first and the last (taking care to use the same sort in the query as well).
Compare both batches. This is done in separate threads to take advantage of multiprocessing.
In the main thread continue from step 1 by scrolling to the next batch of 10k.

Ashish1 · November 4, 2019, 6:35pm

Thanks! I had that in mind but seemed expensive computationally. Will try it to see the actual performance.

Ashish1 · November 4, 2019, 6:40pm

Thanks DmitriV for sharing your experience with this problem.
Interesting approach.
Are you implementing this approach in Apache Spark or other such Big data processing engines ?
If yes, how are you partitioning the data ? Are you keeping a fixed size of 10k documents per partition ?

DmitriV · November 5, 2019, 8:22am

Currently it's just raw Python using scroll queries. The first index always returns 10k documents due to the size limit of the scroll request. The other index returns a variable number of documents (currently could be anything between 0 and 300k) depending on the beginning and end positions of the batch.

Ashish1 · November 6, 2019, 5:53am

Okay. Need to try this approach. Thanks for this!

system · December 4, 2019, 5:53am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search across multiple indices with diffrent fields Elasticsearch	4	1001	December 25, 2019
Join queries or filter queries across indices Elasticsearch	13	792	February 28, 2023
Search for a field across multiple indices Elasticsearch	10	350	November 17, 2021
Search across multiple indices Elasticsearch	5	588	October 11, 2019
Query where fields from different indexes are not all equal? Elasticsearch	6	7403	September 1, 2017

Search across multiple indices for the same field

Related topics