My query is slow


#1

Hello,

I have a list of products, each product has list of offers (nested list). The product has a price (reference price) and each offer has a price, "offers.price" which is a price from reseller ("offers.shop").

I need a summary of how many equal, lower and higher offers is there for each shop, and how many total offers from each shop. (And few other stats, that are in the query). The product has other properties like brand, category, etc.

I have this mappings:

{
  "mappings": {
    "product": {
      "properties": {
        "user_id": {
          "type": "integer"
        },
        "fid": {
          "type": "keyword"
        },
        "name": {
          "type": "keyword",
          "fields": {
            "text": {
              "type": "text"
            }
          }
        },
        "brand": {
          "type": "keyword"
        },
        "ean": {
          "type": "keyword"
        },
        "price": {
          "type": "scaled_float",
          "scaling_factor": 100
        },
        "offers": {
          "type": "nested",
          "properties": {
            "product_fid": {
              "type": "keyword"
            },
            "shop": {
              "type": "keyword"
            },
            "product_price": {
              "type": "scaled_float",
              "scaling_factor": 100
            },
            "price": {
              "type": "scaled_float",
              "scaling_factor": 100
            },
            "source": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

and this is my query:
Basically I create aggr offers (nested), then by_shop (terms) to get buckets with 'offers.shop', and then another terms aggregation by_product_fid because I must take only one offer for each 'product_fid' (the one with the lowest price).

{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "term": {
            "user_id": 1
          }
        }
      ]
    }
  },
  "aggregations": {
    "offers": {
      "nested": {
        "path": "offers"
      },
      "aggregations": {
        "filtered": {
          "filter": {
            "bool": {
              "filter": []
            }
          },
          "aggregations": {
            "by_shop": {
              "terms": {
                "field": "offers.shop",
                "order": {
                  "_term": "asc"
                },
                "size": 10000
              },
              "aggregations": {
                "by_product_fid": {
                  "terms": {
                    "field": "offers.product_fid",
                    "size": 10000
                  },
                  "aggregations": {
                    "lowest_price": {
                      "min": {
                        "field": "offers.price"
                      }
                    },
                    "product_price": {
                      "min": {
                        "field": "offers.product_price"
                      }
                    },
                    "an_offer": {
                      "bucket_script": {
                        "buckets_path": {},
                        "script": "1"
                      }
                    },
                    "lower_offer": {
                      "bucket_script": {
                        "buckets_path": {
                          "lowest_price": "lowest_price",
                          "product_price": "product_price"
                        },
                        "script": "params.product_price > params.lowest_price ? 1 : 0"
                      }
                    },
                    "equal_offer": {
                      "bucket_script": {
                        "buckets_path": {
                          "lowest_price": "lowest_price",
                          "product_price": "product_price"
                        },
                        "script": "params.product_price == params.lowest_price ? 1 : 0"
                      }
                    },
                    "higher_offer": {
                      "bucket_script": {
                        "buckets_path": {
                          "lowest_price": "lowest_price",
                          "product_price": "product_price"
                        },
                        "script": "params.product_price < params.lowest_price ? 1 : 0"
                      }
                    }
                  }
                },
                "common_offers_count": {
                  "sum_bucket": {
                    "buckets_path": "by_product_fid>an_offer"
                  }
                },
                "lower_offers_count": {
                  "sum_bucket": {
                    "buckets_path": "by_product_fid>lower_offer"
                  }
                },
                "equal_offers_count": {
                  "sum_bucket": {
                    "buckets_path": "by_product_fid>equal_offer"
                  }
                },
                "higher_offers_count": {
                  "sum_bucket": {
                    "buckets_path": "by_product_fid>higher_offer"
                  }
                },
                "sorted": {
                  "bucket_sort": {
                    "gap_policy": "insert_zeros",
                    "from": 0,
                    "size": 50
                  }
                }
              }
            },
            "shop_count": {
              "cardinality": {
                "field": "offers.shop"
              }
            }
          }
        }
      }
    }
  }
}

This query is slow - about 4 sec. if user has 1000 products each with about 30 offers. 6000 products of all users. And there will be like 100k products per user on prod. I think that this second terms aggregation by_product_fid slows it down significantly, but is there any other way of getting distinct records, for same offers.product_fid and get the one with lowest offers.price?

Field offers.product_fid is a copy of fid from the product.

Is there any way of investigating the internals of elasticsearch, to know what takes most of the time of processing the query?

Or maybe I should design my mappings differently? I will be grateful for any suggestions...


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.