_id extraction performance in plugin

Hello all!

We use machine learning to score products from an index in our custom elasticsearch search plugin. For this scoring purpose, I need to fetch additional data from external db using product ids, founded by textual matching in a plugin.

We use product id as elasticsearch _id of a document. So we can fetch a product by GET query.

Performance question: what is the most effective way to fetch product id inside the Weight?
Assume we have 20000 matched products.

  1. _id is a stored field and I can use org.elasticsearch.index.fieldvisitor.FieldsVisitor object to extract product id like this (Like it happen in org.elasticsearch.search.fetch.FetchPhase):
    FieldVisitor fv = new FieldVisitor(false);
    fv.reset();
    fv.postProcess(queryContext.getQueryShardContext().getMapperService());
    reader.document(doc, fv);
    productId = fv.uid().id();
    This case will work more than 1second for 20000 products
  2. If we hold product id as a separate DocValue field and fetch it in usual way.
    It will work about 400ms for 20000 products.

The case 2 is faster, but it is also too long. Could you advise me the fastest way?

Regards,
Vadim Gindin

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.