How to compute token counts efficiently?

lucaw · August 3, 2015, 4:57pm

In order to compute a custom script score function I'm working on I need to be able to retrieve the token counts for a collection of fields in my documents. I know that it's difficult to retrieve the field length norm value (or at least I couldn't find any way of doing so) and that instead it's recommend to use the token_count type as in the following example mapping:

"doc": {
    "properties": {
      "title": {
        "type": "string",
        "analyzer" : "my_analyzer",
        "fields": {
          "token_count": {
            "type" : "token_count",
            "store" : "yes",
            "analyzer" : "my_analyzer"
          }
        }
      }
   }
}

This solution works fairly well for me with the downside that since the token_count field has to be reanalyzed it seems to dramatically slow down the speed of indexing (since I'm doing this to almost every field and there are many fields indexing appears to take twice as long). Is there anyway to make this process more efficient? In particular I'm always using the same analyzer for the token_count sub-field as I am for it's super field, can these two not share information in some way rather than having to do the same things twice?

Topic		Replies	Views
Elasticsearch Retrieve token_count standard value from search Elasticsearch	3	465	January 14, 2020
Total tokens count in all documents Elasticsearch	7	2241	July 5, 2017
Token count query and its correlation with field Elasticsearch	1	312	May 15, 2019
Count words/tokens in a field in a document Elasticsearch	3	4013	January 11, 2017
ElasticSearch 7.7.1 plugin for custom score Elasticsearch	1	328	September 11, 2020

How to compute token counts efficiently?

Related topics