Encountering an Issue with Newly Created Index File in Elastic Search

When reading the text data from a PDF document and saving it into the content field of an Elasticsearch document, I encountered an issue where the first saved index file did not return results when searched. However, creating a second index for the same file resulted in successful searches.

I am using C# to search the content using the MatchPhrase function for exact matches. I am seeking assistance to understand the root cause of this issue and how to resolve it.

Elasticsearch

Hi @Dinesh_Raj ,

To clarify, you used two different indices for the same document, and one index was searchable, but the other was not?

Can you share:

  • the mappings of each index
  • the JSON source document you ingested into both indices
  • the query you tried to use (let's eliminate the language client for now, please reproduce the query in curl or Kibana Dev Tools)

Hi @Sean_Story ,

Thank you for the quick response. Please find the comments for you requested details.

you used two different indices for the same document, and one index was searchable, but the other was not? – No, I’m using single indices for same the documents. The second set of documents is searchable, whereas the first set was not

Note : I’m using "content" field to search the data.

the mappings of each index -

* [{"mappings":{"properties":{"DocID":{"type":"integer"},"Number":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"Name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"content":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"contentList":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"documentID":{"type":"integer"},"documentName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"documentSource":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"documentURL":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"iD":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}}]

the JSON source document you ingested into both indices – I’m using single indices and sharing one document data

* {"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1,"hits":[{"_index":"esdocument_dev","_id":"qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28","_score":1,"_source":{"iD":"qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28","DocID":1111,"documentName":"Upload Document","content":"Page :1\r\nDineshraj elastic document 18-03-2024\n\r\n","documentURL":"Upload Document.pdf","Number":"1111-0000","Name":"TEST ES","documentSource":"OTHER_DOCUMENT","documentID":7}}]}}

the query you tried to use (let's eliminate the language client for now, please reproduce the query in curl or Kibana Dev Tools) - curl search query was tried and able to retrieve the both documents.

C# - I'm using NEST library version 7.175 and tested with first saved document getting no result and find screen shot for your reference.

Hi @Dinesh_Raj ,

I think I'm having trouble understanding what your issue is, because you may be misusing some special terms.

"Index" in Elasticsearch is like a SQL table. "Document" is what we call a record in an Index.

the first saved index file did not return results when searched. However, creating a second index for the same file resulted in successful searches.
...
No, I’m using single indices for same the documents. The second set of documents is searchable, whereas the first set was not

So :point_up_2: does not make a lot of sense.

Based on your more recent response, it sounds like you have a single index, with two documents. One of those documents shows up in your search results, the other does not. Is this correct? I'm still confused because you call them "the first set of documents" and "the second set of documents" which makes it sound like more than one document in each "set".

In your paste, you only shared one document. I've reformatted for readability:

      {
        "_index": "esdocument_dev",
        "_id": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28",
        "_score": 1,
        "_source": {
          "iD": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28",
          "DocID": 1111,
          "documentName": "Upload Document",
          "content": "Page :1\r\nDineshraj elastic document 18-03-2024\n\r\n",
          "documentURL": "Upload Document.pdf",
          "Number": "1111-0000",
          "Name": "TEST ES",
          "documentSource": "OTHER_DOCUMENT",
          "documentID": 7
        }
      }

You also shared the mappings (also reformatted here for readability)

{
    "mappings": {
      "properties": {
        "DocID": {
          "type": "integer"
        },
        "Number": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "Name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "content": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "contentList": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "documentID": {
          "type": "integer"
        },
        "documentName": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "documentSource": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "documentURL": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "iD": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }

Please share share both documents (or sets of documents) that you are trying to search, AND the elasticsearch query you are executing. Please do NOT send a screenshot/paste of C# code. It will be more helpful to first troubleshoot with a pure JSON REST API (curl or Kibana Dev Tools) and then we can make sure to translate to c#.

Hi @Sean_Story

Based on your more recent response, it sounds like you have a single index, with two documents. One of those documents shows up in your search results, the other does not. Is this correct? - Yes, Correct.

I have used only one index which has mapping mentioned below.

{
    "mappings": {
      "properties": {
        "DocID": {
          "type": "integer"
        },
        "Number": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "Name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "content": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "contentList": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "documentID": {
          "type": "integer"
        },
        "documentName": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "documentSource": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "documentURL": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "iD": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }

Document which I have inserted

{
        "_index": "esdocument_dev",
        "_id": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28",
        "_score": 1,
        "_source": {
          "iD": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28",
          "DocID": 1111,
          "documentName": "Upload Document",
          "content": "Page :1\r\nDineshraj elastic document 18-03-2024\n\r\n",
          "documentURL": "Upload Document.pdf",
          "Number": "1111-0000",
          "Name": "TEST ES",
          "documentSource": "OTHER_DOCUMENT",
          "documentID": 7
        },
	{
        "_index": "esdocument_dev",
        "_id": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq29",
        "_score": 1,
        "_source": {
          "iD": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq29",
          "DocID": 1111,
          "documentName": "Upload Document",
          "content": "Page :1\r\nDineshraj elastic document 18-03-2024\n\r\n",
          "documentURL": "Upload Document.pdf",
          "Number": "1111-0000",
          "Name": "TEST ES",
          "documentSource": "OTHER_DOCUMENT",
          "documentID": 7
        }

Curl command Query :

curl -X POST "https://development.com/elastic/esdocument_dev/_search" -u elastic -H 'Content-Type: application/json' -d '{
  "query": {
    "match_phrase": {
      "content": "Dineshraj"
    }
  }
}'

For the above query, I'm getting both document as result below.

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.6533776,
        "hits": [
            {
                "_index": "esdocument_dev",
        "_id": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28",
        "_score": 1,
        "_source": {
          "iD": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq28",
          "DocID": 1111,
          "documentName": "Upload Document",
          "content": "Page :1\r\nDineshraj elastic document 18-03-2024\n\r\n",
          "documentURL": "Upload Document.pdf",
          "Number": "1111-0000",
          "Name": "TEST ES",
          "documentSource": "OTHER_DOCUMENT",
          "documentID": 7
                }
            },
            {
               "_index": "esdocument_dev",
        "_id": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq29",
        "_score": 1,
        "_source": {
          "iD": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq29",
          "DocID": 1111,
          "documentName": "Upload Document",
          "content": "Page :1\r\nDineshraj elastic document 18-03-2024\n\r\n",
          "documentURL": "Upload Document.pdf",
          "Number": "1111-0000",
          "Name": "TEST ES",
          "documentSource": "OTHER_DOCUMENT",
          "documentID": 7
                }
            }
        ]
    }
}

I have a specific scenario where I'm querying Elasticsearch for documents based on the content field using the "match_phrase" operator with the text "Content":"Dineshraj". When I perform this query using a curl command, I'm getting both expected documents as results. However, when I execute the same query using the C# NEST library,
I'm only receiving one document as a result("_id": "qqqqqqqqq85-159a-qqqq-a1cc-qqqqqq29" ).

string searchKey = "Dineshraj";
var searchResponse = ElasticConnectionString.EsClient().Search<DocumentMetadata>(s => s
                            .From(from)
                            .Size(size)
                            .Index("esdocument_dev")
                            .Query(q => q
                             .MatchPhrase(m => m
                                .Field(f => f.Content)
                                .Query(searchKey)
                                 )
                            )
                        );

I would appreciate your assistance in resolving this discrepancy. Could you please investigate why I'm not getting both documents in the C# NEST library search response? If there are any known issues or solutions related to the C# library, please redirect me to the relevant team or resources.

What are the from and size parameters you are specifying set to? These seem to be in addition to the example you used in the dev console that worked.

Hi @Christian_Dahlqvist ,

I'm used parameter values and logic mentioned below.

int pageNo = 1;
int pageSize = 100;
int from = (pageNo - 1) * pageSize + 1;
string searchKey = "Dineshraj";
int size = pageNo * pageSize;
var searchResponse = ElasticConnectionString.EsClient().Search<DocumentMetadata>(s => s
                            .From(from)
                            .Size(size)
                            .Index("esdocument_dev")
                            .Query(q => q
                             .MatchPhrase(m => m
                                .Field(f => f.Content)
                                .Query(searchKey)
                                 )
                            )
                        );

It looks like from will be set to 1, which I believe will exclude the first match. Try using the default value of 0 and see if that fixes your issue.

Hi @Christian_Dahlqvist ,

Thank you for the quick response.

I have set as from value of 0 then no documents are getting. If set as 1 then getting only one documents.

What if you try removing these parameters and rely on the defaults? You could also add from and size settings to the query you run in the dev console and see what difference it may make there.

Hi @Christian_Dahlqvist ,

Sorry for delayed response. If I remove those items, I'm getting both result. Thank you.

In my business case, the frontend server-side table binding displays records from 1 to 100 on the first page, and the next page shows records from 101 to 200. How can I achieve this if I remove those items?

Note : Real-time documents contain a large number of records.

Hi there!

I think your from/size calculation is just a little bit off. Like mentioned, the first page should be 0. But in this case, your code sets from to (0 - 1) * 100 + 1 = -99 which obviously is not correct.

You as well set size = pageNo * pageSize which means on page 0 you get 0 results, on page 1 you get 100 results, on page 2 you get 200 results, ...

Just try:

var from = pageNo * pageSize;
var size = pageSize;
2 Likes

Hi @flobernd ,

Thanks for the response. I will check and confirm on asap.

Hi @Christian_Dahlqvist & @flobernd ,

Your support has been invaluable. I've revised the logic for the 'from' value, and now everything is functioning flawlessly. I'm pleased to report that I'm receiving both documents as expected.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.