ElasticSearch NEST: Bulk-indexing operation does not make use of specified document IDs

I currently use the ElasticSearch NEST 7.x library.

On the VM that hosts my ElasticSearch master node, I am running a web server which receives JSON data via REST. These JSON data are then to be saved inside ElasticSearch.

First, the received JSON data are passed into this method for parsing:

private static (bool Success, string ErrorMessage) TryReadRawJsonData(
	string rawJsonData, out IEnumerable<(string Index, ExpandoObject JsonContent)> jsonLines)
{
	var results = new List<(string Index, ExpandoObject JsonContent)>();

	foreach (string rawDataLine in HttpContext.Current.Server.UrlDecode(rawJsonData).Split('\n').Where(line => !string.IsNullOrWhiteSpace(line)))
	{
		dynamic expandoObject = JsonConvert.DeserializeObject<ExpandoObject>(rawDataLine);

		if (!Dynamic.HasProperty(expandoObject, "IndexId"))
		{
			jsonLines = Enumerable.Empty<(string, ExpandoObject)>();
			return (Success: false, ErrorMessage: $"No field named 'IndexId' found in {rawDataLine}.");
		}

		string indexId = (string)expandoObject.IndexId.ToLower();
		results.Add((indexId, JsonContent: expandoObject));
	}

	jsonLines = results;
	return (Success: true, ErrorMessage: null);
}

If successfully parsed, the return value is subsequently passed into this method for bulk indexing:

private static async Task<HttpResponseMessage> BulkIndexAsync(IEnumerable<(string Index, ExpandoObject JsonContent)> contents)
{
	foreach (var group in contents.GroupBy(line => line.Index))
	{
		BulkResponse bulkIndexResponse = 
			await ElasticClient.BulkAsync(bulk => bulk.Index(group.Key).IndexMany(group.Select(member => member.JsonContent)));

		if (bulkIndexResponse.Errors)
		{
			return new HttpResponseMessage(HttpStatusCode.BadRequest)
			{
				Content = new StringContent(bulkIndexResponse.ItemsWithErrors
															 .Select(itemWithError =>
																 $"Index: {itemWithError.Index}; " +
																 $"Document Id: {itemWithError.Id}; " +
																 $"Error: {itemWithError.Error.Reason}.")
															 .ConcatenateIntoString(separator: "\n"))
			};
		}
	}
	return new HttpResponseMessage(HttpStatusCode.OK);
}

The bulk index operation succeeded, but the document IDs are unfortunately not as I expected. Here is an example:

{
	"_index": "dummyindex",
	"_type": "_doc",
	"_id": "U1W4Z20BcmiMRnw-blTi",
	"_score": 1.0,
	"_source": {
		"IndexId": "dummyindex",
		"Id": "0c2d48bd-6842-4f15-b7f2-57fa259b0642",
		"UserId": "dummy_user_1",
		"Country": "dummy_stan"
	}
}

As you can see, the Id field is 0c2d48bd-6842-4f15-b7f2-57fa259b0642, which, according to documentation, should automatically be inferred as the document ID. However, the _id field is set to U1W4Z20BcmiMRnw-blTi instead of 0c2d48bd-6842-4f15-b7f2-57fa259b0642.

What am I doing wrong?

The "Id" on an ExpandoObject is not a property of the type, but a key in the underlying IDictionary<string,object> that ExpandoObject is backed by.

You can see this by reflecting over the properties of ExpandoObject with

dynamic expandoObject = JsonConvert.DeserializeObject<ExpandoObject>(@"{
		""IndexId"": ""dummyindex"",
		""Id"": ""0c2d48bd-6842-4f15-b7f2-57fa259b0642"",
		""UserId"": ""dummy_user_1"",
		""Country"": ""dummy_stan""
	}
");

Type t = expandoObject.GetType();
PropertyInfo[] properties = t.GetProperties(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance);
foreach (PropertyInfo property in properties)
{
	Console.WriteLine(property.ToString());
}

which prints

System.Dynamic.ExpandoClass Class
System.Collections.Generic.ICollection`1[System.String] System.Collections.Generic.IDictionary<System.String,System.Object>.Keys
System.Collections.Generic.ICollection`1[System.Object] System.Collections.Generic.IDictionary<System.String,System.Object>.Values
System.Object System.Collections.Generic.IDictionary<System.String,System.Object>.Item [System.String]
Int32 System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<System.String,System.Object>>.Count
Boolean System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<System.String,System.Object>>.IsReadOnly

To solve your issue, you can specify the Id for each document however by passing the second delegate argument to .IndexMany()

dynamic expandoObject = JsonConvert.DeserializeObject<ExpandoObject>(@"{
		""IndexId"": ""dummyindex"",
		""Id"": ""0c2d48bd-6842-4f15-b7f2-57fa259b0642"",
		""UserId"": ""dummy_user_1"",
		""Country"": ""dummy_stan""
	}
");

var bulkResponse = client.Bulk(bu => bu
    .IndexMany(new[] { expandoObject }, (b, d) => b.Id((Id)d.Id))
);

The cast of d.Id to Id (or could have been string as that is the actual type, but casting to Id will use the implicit conversion from string to Id) is required because d is a dynamic type and the runtime is unable to dispatch without it.

Thank you for your reply! I eventually figured this out yesterday, but didn't update my post to reflect this.

I did not know this. I will try this out. Thank you so much!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.