Merging two indexes by a common field

marwa · August 10, 2017, 9:12am

Hi,

I have an index named 'transactions' and an other one called 'costs'

the transactions indexe contains
{ product_id: "1111", price_unit: "23.56", customer_name:"Marda Elbin" }

the costs index contains the id of each product with its costs

I want to merge both indexes based on the product_id or to make a join betwen them based on the same field (product_id)

My purpose is the use expression language to calculate the revenues - costs for each product thus i used aliases

POST _aliases
{
"actions" : [
{ "add" : { "index" : "costs", "alias" : "alias1" } },
{ "add" : { "index" : "transactions", "alias" : "alias1" } }
]
}
then i tryed to calculate revenues - costs with this query
GET alias1/_search
{
"query" : {
"match_all": {}
},
"script_fields" : {
"test1" : {
"script" : {
"lang": "painless",
"inline": "int gross_profit_margin = 0; if (doc[product_identifier].value == doc['product_id'].value){ gross_profit_margin + = (doc['product_price'].value * doc['quantity'].value * (1 - doc['discount'].value /100)) - (doc['average_marketing_cost'].value + doc['average_promotional_cost'].value + doc['direct_and_indirect_cost'].value ) return gross_profit_margin }"
}
}
}
}

this error appears :
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"... ){ gross_profit_margin + = (doc['product_price'].v ...",
" ^---- HERE"
],
"script": "int gross_profit_margin = 0; if (doc[product_identifier].value == doc['product_id'].value){ gross_profit_margin + = (doc['product_price'].value * doc['quantity'].value * (1 - doc['discount'].value /100)) - (doc['average_marketing_cost'].value + doc['average_promotional_cost'].value + doc['direct_and_indirect_cost'].value ) return gross_profit_margin }",
"lang": "painless"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,

so my questions are

How can i merge indexes by a common fields
does aliases allow to do so

Thenk you??

Mark_Harwood · August 10, 2017, 9:28am

The context of your script's execution is accessing the properties of a single doc from a single index. Your script would not be able to simultaneously see doc1 from transactions index and doc 2 from products index.

You can use the scroll api to read across 2 indices (no alias required) sorting on a common key and your application code would have to process the stream of interleaved results that are sorted on product ID. It would buffer a product's costs, applying them to the next transactions that share the common key. The newly enhanced transactions could then be inserted using the bulk API into a new index.
Of course this only works if a transaction is for exactly one product. If your "transaction" docs are more like orders with >1 product type then this approach clearly won't work as individual orders can't be sorted sensibly by a single key. In this scenario you'd have to do point queries on your product store to retrieve costs (a cache would obviously help here).

marwa · August 10, 2017, 9:42am

I didn't not get what you really mean .Can you explain more to me please

thank you

Mark_Harwood · August 10, 2017, 9:49am

Perhaps it makes sense to clarify the requirement before we clarify the solution:

Do any of your "transaction" docs contain >1 product?
Are you after a single number for margin across all products or per-product margins?
Are you doing this calculation regularly?
Should profit margins be measured as product-cost-at-time-of-sale or based on currently recorded product cost?
How many products and transactions do you have?

marwa · August 10, 2017, 9:57am

for each transaction i have only one product
it would be better to calculate the margin per transaction not per product because for each transaction , we have a product sold with a specific quantity
yes i will do it regularly
the profit margin will be calculated with the currently recorded product cost which is stored in the costs index (i didn't specify the time of the cost of a product , in the index costs i have only product id and its cost nothing more)
i have 1055 products and more than 30000 transactions

thank you

Mark_Harwood · August 10, 2017, 10:08am

Assuming you want to store costs/profit the best suggestion would be to fix your data "on the way in" if possible. 1,000 products is not a lot of data to keep in a RAM cache and lookup as you insert transaction data.
Logstash I believe has some "lookup" type features that could help with this (best to ask in that forum).

Advantages to doing it this way rather than a batch fix-later scheme is

costs would be recorded with the values current at point-of-sale
there is no lag between the logging of transactions and costing of transactions.

marwa · August 10, 2017, 10:12am

So you are suggesting that i remodel my data and to restore both indexes in one index ???

Mark_Harwood · August 10, 2017, 10:16am

Denormalization for the win. Denormalizing Your Data | Elasticsearch: The Definitive Guide [2.x] | Elastic

marwa · August 10, 2017, 10:28am

So i have to denormalize all the data i have manually ? i am sorry but i could'nt figure out what am i supposed to do in my case because denormalizing the data demands that for each transaction i will look for the product id with a query ?

Mark_Harwood · August 10, 2017, 10:42am

This is a pretty common sort of enrichment task that any number of ETL tools support. Personally for a problem this small I'd opt for some custom Python and use a dict as a cache for product info. Like I said, Logstash has some of this data enrichment logic but it looks like a smarter solution with caching is still some way off .

Either way, here is probably not the forum to discuss approaches further as this is somewhat "upstream" of core elasticsearch.

marwa · August 10, 2017, 12:01pm

Just a final question :can i store both data sets in one index but a different types (is it possible when both of the data sets has diferent mapping ) (noticing i have to change the product_id in one of these data sets into an other name) then use painless to test if documents has the same product identifier if the condition is verified i calculate the margin profit ??

Mark_Harwood · August 10, 2017, 12:04pm

No because of my first comment:

system · September 7, 2017, 12:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Merging Two Indexes Elasticsearch	3	9165	July 6, 2017
Merging multi search from 2 elasticsearch indexes Elasticsearch	4	6623	July 6, 2017
How to merge two indexes based on common field in Elasticsearch? Elasticsearch	3	9571	September 4, 2019
Merge indexes Elasticsearch	5	400	May 31, 2018
Joining 2 indices in Elasticsearch Elasticsearch	3	364	July 6, 2017

Merging two indexes by a common field

Related topics