Mappings for stemming


(Richard Tier) #1

I would like to do some stemming, so - for example -"flash cards" and
"flash card" are stemmed to same roots and so return same results (i
appreciate its not an exact science with stemming and my milage may vary).

On the desired field ('description') I set a custom analyzer
('my_analyzer') which uses custom filters.

When I run the following query I get hits (great), but "flash cards" and
"flash card" each return different results:

{"query_string": {
"fields": ["description"],
"query": query
}
}

but when I run the following query I get no results :

{"query_string": {
"fields": ["description.analyzed"],
"query": query
}
}

Looking at my mapping below, we see that description.analyzed and
description have the same config (I did this for testing) - so each field
should behave the same, and stemming should happen.

How can I be sure that the analyzer is being used?

How can I make description.analyzed query work?

How can I be sure stemming is working?

my mappings for the index:

{'mappings': {
'file': { # doc_type
'properties': { # properties for doc_type
'description': { # field called description
'type': 'multi_field', # to allow "sub fields" with different alalysers
'fields': {
'description': {'type': 'string', 'analyzer': 'my_analyser'},
'analysed': {'type': 'string', 'analyzer': 'my_analyser'}
}
},
}
}
},
'settings': {
'analysis': {
'filter': { #declare my custin filters
'filter_ngrams': {'max_gram': 5, 'min_gram': 1, 'type': 'edgeNGram'},
'filter_stop':{'type':'stop', 'enable_position_increments': 'false'},
'filter_shingle':{'type': 'shingle', 'max_shingle_size': 5, 'min_shingle_size': 2, 'output_unigrams':'true'},
'filter_stemmer' : {'type': 'stemmer', 'name': 'english'}
},
'analyzer': { # declare custom analyzers
'my_analyser': {
'filter': ['standard', 'lowercase', 'asciifolding', 'filter_stop', 'filter_shingle', 'filter_stemmer'],
'type': 'custom',
'tokenizer': 'standard'
},
}
}
}
}}

Thanks :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Richard Tier) #2

I now see I had not properly set the mappings displayed - so the solution
was to use put mappings.

On Tuesday, October 1, 2013 1:16:25 AM UTC+1, Richard Tier wrote:

I would like to do some stemming, so - for example -"flash cards" and
"flash card" are stemmed to same roots and so return same results (i
appreciate its not an exact science with stemming and my milage may vary).

On the desired field ('description') I set a custom analyzer
('my_analyzer') which uses custom filters.

When I run the following query I get hits (great), but "flash cards" and
"flash card" each return different results:

{"query_string": {
"fields": ["description"],
"query": query
}
}

but when I run the following query I get no results :

{"query_string": {
"fields": ["description.analyzed"],
"query": query
}
}

Looking at my mapping below, we see that description.analyzed and
description have the same config (I did this for testing) - so each field
should behave the same, and stemming should happen.

How can I be sure that the analyzer is being used?

How can I make description.analyzed query work?

How can I be sure stemming is working?

my mappings for the index:

{'mappings': {
'file': { # doc_type
'properties': { # properties for doc_type
'description': { # field called description
'type': 'multi_field', # to allow "sub fields" with different alalysers
'fields': {
'description': {'type': 'string', 'analyzer': 'my_analyser'},
'analysed': {'type': 'string', 'analyzer': 'my_analyser'}
}
},
}
}
},
'settings': {
'analysis': {
'filter': { #declare my custin filters
'filter_ngrams': {'max_gram': 5, 'min_gram': 1, 'type': 'edgeNGram'},
'filter_stop':{'type':'stop', 'enable_position_increments': 'false'},
'filter_shingle':{'type': 'shingle', 'max_shingle_size': 5, 'min_shingle_size': 2, 'output_unigrams':'true'},
'filter_stemmer' : {'type': 'stemmer', 'name': 'english'}
},
'analyzer': { # declare custom analyzers
'my_analyser': {
'filter': ['standard', 'lowercase', 'asciifolding', 'filter_stop', 'filter_shingle', 'filter_stemmer'],
'type': 'custom',
'tokenizer': 'standard'
},
}
}
}
}}

Thanks :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(sina.tamanna) #3

Great that you found the solution. Just as a hint, to debug such problems
it might help to look into Analyze APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-analyze/.
You could chose which field's analyzer you want to use and verify it though
this API.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4