Dynamically add/update document field based on output from field tokenization


(Jconwell) #1

I've got a custom TokenFilter that analyzes each token in a text field and calculates a bit of metadata about each token. At the end of tokenization, is there a way to write that metadata to another document field? I can add another custom TokenFilter at the end of the Analyzer that collects all metadata for each token, and on the very last token...write it somewhere, but how?

For a simple example of what I'm talking about, say I add a document that has the field "body" with the text "the quick brown fox does stuff and things". And lets say I want to calculate some simple metrics on the body like word count, char count, average word length, and then create new document fields that store these metrics.

So the final stored document would have the additional fields:

{
    original_fields: stuff,
    .
    .
    .
    body_word_count: 8, 
    body_char_count: 41, 
    body_avg_word_len: 4.25
}

Is there a way to do what I'm talking about?


(system) #2