Hi Team,
We are facing an issue while searching the Non English text indexed as PDF type of document. Below are the complete details.
- I am having a pdf document as New_Pdf_issue.pdf which is attached in this mail.
- Created an indexing request alongwith mapping as well which is attached as pdf_index_issue.sh
- Now if you will look onto pdf attachment you will find keywords such as "अधिकार", so if i am searching as "अधिकार" I am not able to get any matching documents for the same.
Note : What we observed is like when we perform search query as
{
"fields": [
"SessionAtt.content_type",
"SessionAtt"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"fields": [
"Content",
"SessionAtt" http://elasticsearch-users.115913.n3.nabble.com/file/n4074717/pdf_index_issue.sh
],
"query": "*"
}
}
]
}
}
}
We are observing as "अधिकार" words has been indexed as "अधधकार".
So can anyone let me know what could be the issue for the same.
Note : As I am not able to upload PDF docs and script file so please have a look onto same from below post-link as well.
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-is-not-able-to-search-for-Nonnglish-text-present-in-PDF-type-of-attachment-td4074717.html
~Prashant