Populating TermVector when having tokenizer outside ES


(Neeraj Makam) #1

Hi,

I have a mapping in which there is a nested list of words (which is
generated by a tokenizer residing outside ES). Each word has fields
'token_offset' and 'character_offset' which is populated by my tokenizer.
This is the mapping i am using (say):

{
"contract": {
"_id" : {
"path" : "objectId"
},
"properties": {
"filepath": {
"type": "string",
"index": "not_analyzed"
},
"objectId": {
"type": "string",
"index": "no"
},
"words": {
"type": "nested",
"properties": {
"characterOffset": {
"type": "long",
"index": "no"
},
"wordType": {
"type": "long"
},
"tokenOffset": {
"type": "long",
"index": "no"
},
"value": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}

I want to be able to do a query which says:
"value" == "foo" AND "wordType" == 5.
This made me map the list "words" as nested. [1]

For eg:
if the text is "this is foo and bar", my tokenizer separates out each
word and associates wordType for each word, and also generates
characterOffset and tokenOffset.
i.e
word[0].value = "this"
word[0].wordType = 5
word[0].characterOffset = 0
word[0].tokenOffset = 0

Now how do i populate the termvector of ES so as to leverage its phrase
search and other features such as "AND/OR/NEAR" etc?? [2]

[1] - Is there a way i can implement this without using the concept of
nested (because this will separate out each word into a separate document)

[2] - Can a custom analyzer be used to populate the term vector of ES
while having the tokenizer outside ES (assuming due to business necessity,
moving the tokenizer inside ES is not feasible).

//The feature i need to implement is phrase search, supporting AND/OR/NEAR
and highlighting.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7c363b2d-3dc0-4096-8d47-ab70ee20d181%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2