Problem mapping response to Model class [Elasticsearch.NET 8.13]

Hi, I am unable to map the ElasticsearchClient SearchASync result into my data model. Gotten a conversion error:

The server encountered an error processing the request. Please see the [service help page] for constructing valid requests to the service. The exception message is 'The JSON value could not be converted to System.String. Path: $.body | LineNumber: 0 | BytePositionInLine: 224.

Below are my code snippets:

namespace Model
{
    public class ElasticsearchDocs
    {
        public string Id { get; set; }
        public string Title { get; set; }
        public string Url { get; set; }
        public string Body { get; set; }
    }
}
public async Task<ElasticsearchDocs> SearchDocumentsByContent()
{
	ServicePointManager.Expect100Continue = true;
	ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

	
	var settings = new ElasticsearchClientSettings(new Uri("https://localhost:9200"))
		.CertificateFingerprint("795b709a3a0f70ee69f12320a5d73c49c2465a5dfcf4e1c13a928cef4c6e6c2e")
		.Authentication(new BasicAuthentication("<USERNAME>", "<PASSWORD>"));


	var client = new ElasticsearchClient(settings);

	ElasticsearchDocs esDocs = new ElasticsearchDocs(); 

	var request = new SearchRequest("search-ocr-poc")
	{
		Query = new MatchAllQuery()
	};

	var response = await client.SearchAsync<ElasticsearchDocs>(request);

	if (response.IsValidResponse)
	{
		esDocs = response.Documents.FirstOrDefault();
	}
	return esDocs;
}

not sure if need to manually map the field to data model.

Do i need to reindex my index ?

I already have my index created during the Elasticsearch connector processor.

This is my output when i use the command

GET /search-ocr-poc/_search
{
  "query": {
    "match_all": {}
  }
}
{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 69,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "search-ocr-poc",
        "_id": "ae7be893-6ced-4b0c-9230-b6bc168dcddd",
        "_score": 1,
        "_source": {
          "creation_time": "2023-12-26T13:55:14Z",
          "file_name": "",
          "editor_id": 1073741823,
          "type": "list_item",
          "title": "All Draft Documents", 
		  "url": "/sites/COPS/Reports%20List/DispForm.aspx?ID=5&Source=/sites/COPS/Reports%20List/AllItems.aspx&ContentTypeId=0x01009EC04F25E7BD464D9978DF8AB3448A9D",
          "size": 0,
          "id": "ae7be893-6ced-4b0c-9230-b6bc168dcddd",
          "author_id": 1073741823,
          "_timestamp": "2023-12-26T13:55:14Z"
        }
      },
      {
        "_index": "search-ocr-poc",
        "_id": "d4e860fd-8688-442b-a4b1-756dedb29397",
        "_score": 1,
        "_source": {
          "creation_time": "2023-12-26T13:56:16Z",
          "server_relative_url": "/sites/COPS/Documents",
          "parent_web_url": "/sites/COPS",
          "id": "d4e860fd-8688-442b-a4b1-756dedb29397",
          "type": "document_library",
          "title": "Documents",
          "_timestamp": "2024-04-15T08:10:57Z",
          "url": "/sites/COPS/Documents"
        }
      }
    ]
  }
}

Hi @hiilmiee,

this is definitely the correct way:

Could you please share the complete callstack and exception message?

Are you sure it's not working with the ElasticsearchDocs class you posted? I don't see any int property there and I'm a little bit surprised about the exception for that matter.

Hi @flobernd ,

i use
var response = await client.SearchAsync<ElasticsearchDocs>(request);

the exception i received is error convert to String instead of Int32. Here's the exception message:

The server encountered an error processing the request. Please see the [service help page] for constructing valid requests to the service. The exception message is 'The JSON value could not be converted to System.String. Path: $.body | LineNumber: 0 | BytePositionInLine: 224.

Somehow do i need to manually map the response field to my model ?

Hi @hiilmiee,

in your ElasticsearchDocs you declare Body as string, but it seems on the Elasticsearch side, it's stored as something different. In the response you've shown earlier, it seems that body is missing from the document payload. Is that always the case?

hi @flobernd ,

some document payload will have body, some do not, based on the data ingested using the connector. the body data is being extracted using tesseract-ocr for image and pdf files

Hi @hiilmiee, do you have an example JSON payload that contains a body. I suspect that the type is not compatible with string.

Hi @flobernd ,

there are 2 kinds of body content

#1

{
"_index": "search-ocr-poc",
"_id": "5c87196f-eb64-4b2e-8570-5b41ef1f7759",
"_score": 2.591784,
"_source": {
"creation_time": "2024-03-28T07:39:05Z",
"size": 1847715,
"server_relative_url": "/sites/COPS/Documents/1.png",
"id": "5c87196f-eb64-4b2e-8570-5b41ef1f7759",
"type": "File",
"title": "1.png",
"body": "Oops Access denied er Please check with your System Administrator on System Access,, Thank you",
"_timestamp": "2024-04-01T04:11:07Z",
"url": "/sites/COPS/Documents/1.png"
}
}

#2

{
"_index": "search-ocr-poc",
"_id": "b9d2861f-83c1-4f99-9f20-63dd18c56875",
"_score": 1.0311186,
"source": {
"creation_time": "2024-04-05T04:24:21Z",
"size": 181624,
"server_relative_url": "/sites/COPS/Documents/Image_to_PDF_20240404_20.05.03.pdf",
"id": "b9d2861f-83c1-4f99-9f20-63dd18c56875",
"type": "File",
"title": "Image_to_PDF_20240404_20.05.03.pdf",
"body": [
"""Page 1 of 2 * URGEN |! D@LL lechnoiogies Delivery Note uO |[NA ASSY.CROPLNBLDEMFCOAOMUK2 ‘| 1 |
| >] _cwam" | NA ASSY.CRD.SCTY.TRPM12,14G Sid St dT id ee ee a fs] Some of the original defective parts from your system must be returned to Dell within 10 business days from the delivery date of the replacement part. Those defective parts that are marked with a * do not need to be returned. All other parts, without this Insignia, not returned to Dell within 10 business days from the delivery date will have an invoice generated and issued accordingly for the values of the part at Dell current price. Payment of the invoice is required within 30days from the invoice date. To schedule the part collection please follow the return instructions packed with this delivery note. Company Stamp : Issuers Name : Date and Time"""
],
"_timestamp": "2024-04-05T04:25:38Z",
"url": "/sites/COPS/Documents/Image_to_PDF_20240404_20.05.03.pdf"
}
}

Hi @flobernd , also how do i map fields like "_timestamp", "editor_id" ? do my model have to be naming convention ?

e.g

public Timestamp _timestamp;
public int editor_id

Hi @hiilmiee,

I can now see the problem. In your CLR class you declare Body as string. In your data body is sometimes a string, but sometimes string[] (array of string).

To support these kind of complex datatypes in a clean way, you have to implement a custom converter.

An alternative would be to declare Body as JsonElement and inspect its properties to parse the contained value as string or string[], depending on the token type.

You can use the JsonPropertyName attribute on your properties to make sure they are matching the name in your model.

Hi @flobernd , thanks for the tips. somehow i just changed my data body to be all string[], so will not have any issue. Anyway, are there any way for me to manually map data to my model ? for example my body got this

{
  "_index": "search-search-ntlm",
  "_id": "ae7be893-6ced-4b0c-9230-b6bc168dcddd",
  "_score": 1,
  "_source": {
    "creation_time": "2023-12-26T13:55:14Z",
    "file_name": "",
    "editor_id": 1073741823,
    "type": "list_item",
    "title": "All Draft Documents",
    "url": "https://hostname/sites/test/Reports%20List/DispForm.aspx?ID=5&Source=https://hostname/sites/COPS/Reports%20List/AllItems.aspx&ContentTypeId=0x01009EC04F25E7BD464D9978DF8AB3448A9D",
    "size": 0,
    "id": "ae7be893-6ced-4b0c-9230-b6bc168dcddd",
    "author_id": 1073741823,
    "_timestamp": "2023-12-26T13:55:14Z",
    "_allow_access_control": [
      "login_name:SHAREPOINT\\system",
      "email:test02@test.com",
      "login_name:domain\\testuser01",
      "login_name:domain\\testuser02",
      "user_id:1",
      "login_name:domain\\test01",
      "email:sample1@test.com",
      "user_id:22",
      "user_id:1073741823",
      "email:officer01@test.com",
      "email:sample2@test.com",
      "login_name:domain\\spsaccount",
      "login_name:domain\\test02",
      "user_id:20",
      "user_id:21",
      "user_id:17"
    ]
  }
}

and i would like to map the _allow_access_control seperately to my model attribute str email, str login_name, int user_id seperately

example model class:

namespace Model
{
    public class ElasticsearchDocs
    {
        public string Id { get; set; }
        public string Title { get; set; }
        public string Url { get; set; }
        public string Body { get; set; }
        public string Email { get; set; }
        public string LoginName { get; set; }
        public int UserID { get; set; }
    }
}