I installed elasticsearch 5.0.1 and ingest attachment plugin.
I have indexed pdf document using ingest attachment processor.
Now i want to do 'Stemming' in the content of attachement. I tried as below,
- Created index and set the analyzer for the same.
curl -XGET 'http://localhost:9200/idx_analyser?pretty' { > ` "idx_analyser" : {` > "aliases" : { }, > "mappings" : { > "test" : { > "properties" : { > "attachment" : { > "properties" : { > "content" : { > "type" : "text", > "fields" : { > "keyword" : { > "type" : "keyword", > "ignore_above" : 256 > } > } > }, > "content_length" : { > "type" : "long" > }, > "content_type" : { > "type" : "text", > "fields" : { > "keyword" : { > "type" : "keyword", > "ignore_above" : 256 > } > } > }, > "language" : { > "type" : "text", > "fields" : { > "keyword" : { > "type" : "keyword", > "ignore_above" : 256 > } > } > } > } > }, > "data" : { > "type" : "text", > "fields" : { > "keyword" : { > "type" : "keyword", > "ignore_above" : 256 > } > } > }, > "text" : { > "type" : "text", > "analyzer" : "custom_lowercase_stemmed" > } > } > } > }, > "settings" : { > "index" : { > "number_of_shards" : "5", > "provided_name" : "idx_analyser", > "creation_date" : "1479885039440", > "analysis" : { > "filter" : { > "custom_english_stemmer" : { > "name" : "english", > "type" : "stemmer" > } > }, > "analyzer" : { > "custom_lowercase_stemmed" : { > "filter" : [ > "lowercase", > "custom_english_stemmer" > ], > "tokenizer" : "standard" > } > } > }, > "number_of_replicas" : "1", > "uuid" : "FrJEtt-BSgq2ROka2PZ4CA", > "version" : { > "created" : "5000199" > } > } > } > } > }
- Indexed base64content using "pipeline = attachment" processor
curl -XPUT 'http://localhost:9200/idx_analyser/test/1?pipeline=attachment&pretty' -d' { "text": "VGhpcyBpbmRleCBoYXZpbmcgaW5mb3JtYXRpb24=" }'
> `{
"_index" : "idx_analyser",
"_type" : "test",
"_id" : "1",
"_version" : 2,
"found" : true,
"_source" : {
"data" : "VGhpcyBpbmRleCBoYXZpbmcgaW5mb3JtYXRpb24=",
"attachment" : {
"content_type" : "text/plain; charset=ISO-8859-1",
"language" : "en",
"content" : "This index having information",
"content_length" : 30
}
}
}
`
Searching the content 'having' returns the expected result
curl -XGET 'http://localhost:9200/idx_analyser/_search?q=attachment.content=having'
Where as i want to get the same result if i search for 'have' (shown below) . this is not coming !!
curl -XGET 'http://localhost:9200/idx_analyser/_search?q=attachment.content=have'
Am i doing anything wrong here ? Please help to resolve this ...