Metadata: using one field to populate another

jschelle_2 · May 22, 2012, 4:12pm

I've been searching around trying to find the best way to do this, but
haven't really found anything so far. Any help would be appreciated.
Let's say I have simple documents that come in through a custom River:

doc:
title: string
content: string

And I want to add the following fields (numberOfNames, numberOfWords,
numberOfPlaces) and populate them based on a custom parsing of the content
field. Essentially I'm analyzing one field of the original document and
using it to populate addition fields (not part of the original document) so
that I can provided facetted search on these additional fields. Are there
some existing approaches to do this? I would imagine this is pretty common
but haven't been able to find much out there.

I was thinking of making a new Analyzer, something like MetadataAnalyzer
where you could configure a sequence of Tokenizer and Filter objects that
lead to the tokens to be indexed for a given field. For example you could
do something like this:

index:
analysis:
analyzer:
metadata_analyzer:
numberOfWords: WordsTokenizer, CountTokenFilter
numberOfPlaces: WordsTokenizer, PlaceTokenFilter,
CountTokenFilter
numberOfNames: WordsTokenizer, NameTokenFilter, CountTokenFilter

Thoughts??

Ivan · May 23, 2012, 3:06pm

If the only difference between each field is the analysis, you could
use multi-fields on the original source field.

Each field can have its own analyzer (custom or not). The primary use
of multi-field is for when you want to define different analyzers on
the same source field.

--
Ivan

On Tue, May 22, 2012 at 9:12 AM, shadow000fire jason.scheller@gmail.com wrote:

I've been searching around trying to find the best way to do this, but
haven't really found anything so far. Any help would be appreciated. Let's
say I have simple documents that come in through a custom River:

doc:
title: string
content: string

And I want to add the following fields (numberOfNames, numberOfWords,
numberOfPlaces) and populate them based on a custom parsing of the content
field. Essentially I'm analyzing one field of the original document and
using it to populate addition fields (not part of the original document) so
that I can provided facetted search on these additional fields. Are there
some existing approaches to do this? I would imagine this is pretty common
but haven't been able to find much out there.

I was thinking of making a new Analyzer, something like MetadataAnalyzer
where you could configure a sequence of Tokenizer and Filter objects that
lead to the tokens to be indexed for a given field. For example you could
do something like this:

index:
analysis:
analyzer:
metadata_analyzer:
numberOfWords: WordsTokenizer, CountTokenFilter
numberOfPlaces: WordsTokenizer, PlaceTokenFilter,
CountTokenFilter
numberOfNames: WordsTokenizer, NameTokenFilter, CountTokenFilter

Thoughts??

jschelle_2 · May 23, 2012, 7:43pm

Oh perfect, thanks!

Thanks,
Jay

On May 23, 2012, at 11:06 AM, Ivan Brusic ivan@brusic.com wrote:

If the only difference between each field is the analysis, you could
use multi-fields on the original source field.

Elasticsearch Platform — Find real-time answers at scale | Elastic

Each field can have its own analyzer (custom or not). The primary use
of multi-field is for when you want to define different analyzers on
the same source field.

--
Ivan

On Tue, May 22, 2012 at 9:12 AM, shadow000fire jason.scheller@gmail.com wrote:

I've been searching around trying to find the best way to do this, but
haven't really found anything so far. Any help would be appreciated. Let's
say I have simple documents that come in through a custom River:

doc:
title: string
content: string

And I want to add the following fields (numberOfNames, numberOfWords,
numberOfPlaces) and populate them based on a custom parsing of the content
field. Essentially I'm analyzing one field of the original document and
using it to populate addition fields (not part of the original document) so
that I can provided facetted search on these additional fields. Are there
some existing approaches to do this? I would imagine this is pretty common
but haven't been able to find much out there.

I was thinking of making a new Analyzer, something like MetadataAnalyzer
where you could configure a sequence of Tokenizer and Filter objects that
lead to the tokens to be indexed for a given field. For example you could
do something like this:

index:
analysis:
analyzer:
metadata_analyzer:
numberOfWords: WordsTokenizer, CountTokenFilter
numberOfPlaces: WordsTokenizer, PlaceTokenFilter,
CountTokenFilter
numberOfNames: WordsTokenizer, NameTokenFilter, CountTokenFilter

Thoughts??

jconwell · July 16, 2015, 8:54pm

I'm trying to do something very similar.

Wouldn't using a multi-field for this scenario cause the same field to be tokenized 4 different times? Once for the normal text field tokenizaiton, and three times for each metric you are calculating.

Is there a way to perform all three analyses in one analyzer pipeline and then store the 3 resulting metrics to new fields?

nik9000 · July 16, 2015, 9:12pm

Not right now, no.

Topic		Replies	Views
Correctly indexing data into one place with multiple analyzers Elasticsearch	4	930	July 6, 2017
Dynamically add/update document field based on output from field tokenization Elasticsearch	1	388	July 6, 2017
Multi field custom elasticsearch analyzer Elasticsearch	3	573	February 18, 2022
Analyze multiple fields with a single request Elasticsearch	1	486	May 10, 2017
Analysis plugin to access multiple fields from source document Elasticsearch	1	473	July 5, 2017

Metadata: using one field to populate another

Related topics