_id extraction performance in plugin


(Vadim Gindin) #1

Hello all!

We use machine learning to score products from an index in our custom elasticsearch search plugin. For this scoring purpose, I need to fetch additional data from external db using product ids, founded by textual matching in a plugin.

We use product id as elasticsearch _id of a document. So we can fetch a product by GET query.

Performance question: what is the most effective way to fetch product id inside the Weight?
Assume we have 20000 matched products.

  1. _id is a stored field and I can use org.elasticsearch.index.fieldvisitor.FieldsVisitor object to extract product id like this (Like it happen in org.elasticsearch.search.fetch.FetchPhase):
    FieldVisitor fv = new FieldVisitor(false);
    fv.reset();
    fv.postProcess(queryContext.getQueryShardContext().getMapperService());
    reader.document(doc, fv);
    productId = fv.uid().id();
    This case will work more than 1second for 20000 products
  2. If we hold product id as a separate DocValue field and fetch it in usual way.
    It will work about 400ms for 20000 products.

The case 2 is faster, but it is also too long. Could you advise me the fastest way?

Regards,
Vadim Gindin