Hi,
I have a a document which contains a subdocument, as such I have made the subdocument a nested property of the document. Now i need to find term vectors for the sub document. My terms can be unigrams or bigrams, hence I created an analyzer with shingle filter. The setting for the index is as follows
{
"settings": {
"analysis": {
"filter": {
"light_english_stemmer": {
"type": "stemmer",
"language": "light_english"
},
"filter_shingle":{
"type":"shingle",
"max_shingle_size":3,
"min_shingle_size":2,
"output_unigrams":"true",
"filler_token" : ""
}
},
"analyzer": {
"keyword_discovery_analyzer": {
"tokenizer": "standard",
"char_filter": [ "html_strip" ],
"filter": [
"lowercase",
"filter_shingle",
"light_english_stemmer"
]
}
}
}
},
"mappings": {
"doc" : {
"properties" : {
"name" : {
"type" : "text"
},
"description" : {
"type" : "text",
"analyzer" : "indexing_analyzer",
"search_analyzer": "search_analyzer",
"fields" : {
"termVec": {
"type" : "text",
"term_vector": "yes",
"store" : true,
"analyzer" : "keyword_discovery_analyzer"
}
}
},
"subDoc" : {
"type" : "nested",
"properties" : {
"name" : {
"type" : "text",
"fields" : {
"termVec": {
"type" : "text",
"term_vector": "yes",
"store" : true,
"analyzer" : "keyword_discovery_analyzer"
}
}
},
"description" : {
"type" : "text",
"fields" : {
"termVec": {
"type" : "text",
"term_vector": "yes",
"store" : true,
"analyzer" : "keyword_discovery_analyzer"
}
}
}
}
}
}
}
}
}
When i execute request
GET /_termvectors
{
"fields" : ["subDoc.name.termVec"],
"offsets" : false,
"payloads" : false,
"positions" : false,
"term_statistics" : true,
"field_statistics" : true,
"filter" :{
"max_num_terms" : 4
}
}
I get empty result. However if instead of the above query i run the following,
GET /12631946/_termvectors
{
"fields" : ["subDoc.name"],
"offsets" : false,
"payloads" : false,
"positions" : false,
"term_statistics" : true,
"field_statistics" : true,
"per_field_analyzer" : {
"name": "keyword_discovery_analyzer"
},
"filter" :{
"max_num_terms" : 15
}
}
ES would evaluate term vectors on the fly, and i get the results, but none of the results contains bigram terms all are unigrams.
My analyzer is working correctly, because when I put the same analyzer on my doc.name variable it gives me bigrams when term vectors are computed and stored, however in doc.name as well, if the term vectors are computed at runtime, it always returns me unigrams.
Please let me know what i am doing wrong.
Thanks
Vishvadeepak Tewari