Cannot search my pdf files

Mandy_Poon · November 30, 2023, 7:10am

I have a pdf file and there is a wording "XXX Contract ID - 170458" on the pdf.
However I cannot search my file if I use "Contract ID - 170458".
(I can search the pdf file if I use "170458")

Anyone can help?

Thank you very much

// My code here

get /contract_dms/_search
{
  "_source": ["file.*","highlight.*"],
  "query": {
    "query_string": {
      "query": "Contract ID - 170458",
       "default_operator": "AND",
       "fields": [ "content"]
       
    }
  },
  "highlight": {
    "number_of_fragments": 4,
    "fragment_size": 100,
    "require_field_match": "true",
    "fields": {
      "content": {},
      "file.filename":{}
    }
  },
  "sort":[
    {"file.last_modified":{"order":"asc", "unmapped_type":"keyword"}}
  ],
  "size":50,
  "from":0
}
````````````````````````````````````````````````````````````
This category relates to the [Enterprise Search](https://www.elastic.co/enterprise-search) set of products - App Search, Site Search and Workplace Search.
If your question relates to core Elasticsearch functionality, please head over to the #elastic-stack:elasticsearch category for assistance.

*PS - Please delete this text before posting your topic*

ashishtiwari1993 · December 1, 2023, 5:23am

Hello @Mandy_Poon, Welcome to the Elastic community.

Could you please show mapping

GET contract_dms/_mapping

I think term is not created with whole string which you are looking for.

nextgen · December 1, 2023, 7:44am

Mandy_Poon:

I have a pdf file and there is a wording "XXX Contract ID - 170458" on the pdf.
However I cannot search my file if I use "Contract ID - 170458".
(I can search the pdf file if I use "170458")

Anyone can help?

Thank you very much

// My code here
get /contract_dms/_search
{
  "_source": ["file.*","highlight.*"],
  "query": {
    "query_string": {
      "query": "Contract ID - 170458",
       "default_operator": "AND",
       "fields": [ "content"]
       
    }
  },
  "highlight": {
    "number_of_fragments": 4,
    "fragment_size": 100,
    "require_field_match": "true",
    "fields": {
      "content": {},
      "file.filename":{}
    }
  },
  "sort":[
    {"file.last_modified":{"order":"asc", "unmapped_type":"keyword"}}
  ],
  "size":50,
  "from":0
}
This category relates to the Enterprise Search set of products - App Search, Site Search and Workplace Search.
If your question relates to core Elasticsearch functionality, please head over to the Elasticsearch category for assistance.

PS - Please delete this text before posting your topic

One potential issue could be how the content is indexed in Elasticsearch. It seems that the default operator is set to "AND" in your query. When using the "AND" operator, Elasticsearch will try to find documents that contain all the specified terms. In your case, "Contract ID - 170458" might not be present as a complete phrase in the content.

Here are a few suggestions you can try:

Use a Phrase Query:
Change your query to a phrase query to search for the entire string as a phrase. Update your query to:


"query_string": {
  "query": "\"Contract ID - 170458\"",
  "default_operator": "AND",
  "fields": ["content"]
}

The double quotes around the query indicate that you want to search for the entire phrase.

Use a Match Query:
Try using a match query instead, which is often more suitable for full-text search. Here's an example:

"match": {
  "content": "Contract ID - 170458"
}

This will search for documents where the entire phrase "Contract ID - 170458" is present in the content field.

Check Analyzer Settings:
Ensure that the analyzer used during indexing and searching is not splitting the string in a way that makes it difficult to find. You may need to customize the analyzer settings to better suit your needs.

After making changes, reindex your content or update your existing documents accordingly. Keep in mind that the effectiveness of these suggestions may depend on your specific Elasticsearch mapping and analyzer configurations.

system · December 29, 2023, 7:45am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search within PDF files Elasticsearch	9	5535	August 26, 2017
Indexing pdf documents Elasticsearch	2	5196	December 27, 2016
Encountering an Issue with Newly Created Index File in Elastic Search Elasticsearch language-clients	14	179	April 18, 2024
Serching requested files in Kibana which ends in a number and the file extension Elasticsearch	6	2258	April 13, 2017
Hello, I am a newbie . I am looking for a solution where I can search with keywords from millions of pdfs Elastic Search elastic-workplace-search	2	329	January 2, 2023

Cannot search my pdf files

Related topics