Elasticsearch cross-index query with aggregations

I use: Elasticsearch 7.7 , Kibana 7.7

For example, lets take two indexes:

User index with simple mapping:

PUT /user_index
{
  "mappings": {
    "properties": {
      "user_id":    { "type": "text" },
      "user_phone":    { "type": "text" },
      "name":   { "type": "text"  }     
    }
  }
}

Check with simple mapping:

PUT /check_index
{
  "mappings": {
    "properties": {
      "user_id":    { "type": "text" },  
      "price":   { "type": "integer"  },
      "goods_count":  {"type": "integer"}
    }
  }
}

I want to build table visualization like that:

________________________________________________________________________
  user_id  |   user_phone  | average_price       |    sum_goods_count  |
___________|_______________|_____________________|______________________
     1     |       123     |       512           |         64          |
___________|_______________|_____________________|______________________
     2     |       456     |       256           |         16          | 
___________|_______________|_____________________|______________________

So my questions are:

  1. Is it real?

  2. Do I understand correctly that I need to query these two indexes, get a list of users, and then in a loop create shopping carts with checks?

Ah, a textbook relational model case.

If you really must shape your indices like that, have a look at Joining Queries (in particular Terms Lookup can help you).

More generally, Elasticsearch is an indexing engine. An index trades space (e.g. by repeating data) and RAM+CPU for far greater speed in many querying situations. But you have to perform the tradeoff by repeating the information necessary to produce the visualisation.

So:

PUT /check_index
{
  "mappings": {
    "properties": {
      "user_id":    { "type": "keyword" }, 
      "user_phone":    { "type": "text" },
      "name":   { "type": "text"  }  
      "price":   { "type": "integer"  },
      "goods_count":  {"type": "integer"}
    }
  }
}

This makes it harder to update: if a user's phone or name changes, you have to fetch all check_index docs that relate to that user and update them. On the other hand this is the simplest, fastest data structure to use if you want to visualise that data. If your data is also kept in a relational database, there's even less reason to continue keeping it in a relational structure in Elasticsearch. Instead, build multiple indices with it suited to different querying, aggregation and visualisation tasks by putting all necessary data in the simplest form in each index. Whether to make the trade-off here is up to you: a phone number or name probably don't change very often. If you regularly destroy and recreate the index from the source data then that's even better since you may not need to propagate updates with a separate process, you just wait for your data refresh.

Also note I've changed user_id to keyword type. For IDs, I'd recommend keyword over text, and using a Term query to filter for docs related to that user_id.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.