I have a pdf file and there is a wording "XXX Contract ID - 170458" on the pdf.
However I cannot search my file if I use "Contract ID - 170458".
(I can search the pdf file if I use "170458")
Anyone can help?
Thank you very much
// My code here
get /contract_dms/_search
{
"_source": ["file.*","highlight.*"],
"query": {
"query_string": {
"query": "Contract ID - 170458",
"default_operator": "AND",
"fields": [ "content"]
}
},
"highlight": {
"number_of_fragments": 4,
"fragment_size": 100,
"require_field_match": "true",
"fields": {
"content": {},
"file.filename":{}
}
},
"sort":[
{"file.last_modified":{"order":"asc", "unmapped_type":"keyword"}}
],
"size":50,
"from":0
}
````````````````````````````````````````````````````````````
This category relates to the [Enterprise Search](https://www.elastic.co/enterprise-search) set of products - App Search, Site Search and Workplace Search.
If your question relates to core Elasticsearch functionality, please head over to the #elastic-stack:elasticsearch category for assistance.
*PS - Please delete this text before posting your topic*
One potential issue could be how the content is indexed in Elasticsearch. It seems that the default operator is set to "AND" in your query. When using the "AND" operator, Elasticsearch will try to find documents that contain all the specified terms. In your case, "Contract ID - 170458" might not be present as a complete phrase in the content.
Here are a few suggestions you can try:
Use a Phrase Query:
Change your query to a phrase query to search for the entire string as a phrase. Update your query to:
The double quotes around the query indicate that you want to search for the entire phrase.
Use a Match Query:
Try using a match query instead, which is often more suitable for full-text search. Here's an example:
"match": {
"content": "Contract ID - 170458"
}
This will search for documents where the entire phrase "Contract ID - 170458" is present in the content field.
Check Analyzer Settings:
Ensure that the analyzer used during indexing and searching is not splitting the string in a way that makes it difficult to find. You may need to customize the analyzer settings to better suit your needs.
After making changes, reindex your content or update your existing documents accordingly. Keep in mind that the effectiveness of these suggestions may depend on your specific Elasticsearch mapping and analyzer configurations.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.