Reading more through the Elasticsearch and percolator in 5.0 document I realize the document is not stored in 5.0 the way I set it up in my index.
My question is how can I achieve percolating a pdf text? Is it possible in Elasticsearch? If not what are the alternatives?
I setup a pipeline and ingested pdf to an index with attachment. So I have attachment.content available .
I have the percolator work with document in another index. My test-
PUT /action-index
{
"mappings": {
"doctype": {
"properties": {
"message": {
"type": "text"
}
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
PUT /action-index/queries/A2?refresh
{
"query" : {
"match" : {
"message" : "Its here in Dallas, TX"
}
}
}
Created a document -
PUT /action-index/message/2
{
"message" : "Does it have Dallas in the text"
}
And now percolate reading the document from this index-
GET /action-index/_search
{
"query" : {
"percolate" : {
"field": "query",
"document_type" : "doctype",
"index" : "action-index",
"type" : "message",
"id" : "2",
"version" : 1
}
}
}
And it works like a charm. Thanks to clear documented at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-percolate-query.html
Now I want to extend that to pick pdf content that I indexed in elastic search
It doesn't give error but never finds a document Hits:0
The text that I am looking to percolate is in field attachment.content. I wonder does it need to be mentioned anywhere in the search above.
Any thoughts???
Thanks David for your response.
Actually I solved it and would post the clean solution in a bit.
In a nutshell I was using "fields": "attachment.content" in
GET /action-index/_search whereas in percolator query I was giving "message"
By replacing "message" in query to "attachment.content" I got it working.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.