ElasticSearch NEST: Bulk-indexing operation does not make use of specified document IDs

Miao · September 25, 2019, 9:47am

I currently use the ElasticSearch NEST 7.x library.

On the VM that hosts my ElasticSearch master node, I am running a web server which receives JSON data via REST. These JSON data are then to be saved inside ElasticSearch.

First, the received JSON data are passed into this method for parsing:

private static (bool Success, string ErrorMessage) TryReadRawJsonData(
	string rawJsonData, out IEnumerable<(string Index, ExpandoObject JsonContent)> jsonLines)
{
	var results = new List<(string Index, ExpandoObject JsonContent)>();

	foreach (string rawDataLine in HttpContext.Current.Server.UrlDecode(rawJsonData).Split('\n').Where(line => !string.IsNullOrWhiteSpace(line)))
	{
		dynamic expandoObject = JsonConvert.DeserializeObject<ExpandoObject>(rawDataLine);

		if (!Dynamic.HasProperty(expandoObject, "IndexId"))
		{
			jsonLines = Enumerable.Empty<(string, ExpandoObject)>();
			return (Success: false, ErrorMessage: $"No field named 'IndexId' found in {rawDataLine}.");
		}

		string indexId = (string)expandoObject.IndexId.ToLower();
		results.Add((indexId, JsonContent: expandoObject));
	}

	jsonLines = results;
	return (Success: true, ErrorMessage: null);
}

If successfully parsed, the return value is subsequently passed into this method for bulk indexing:

private static async Task<HttpResponseMessage> BulkIndexAsync(IEnumerable<(string Index, ExpandoObject JsonContent)> contents)
{
	foreach (var group in contents.GroupBy(line => line.Index))
	{
		BulkResponse bulkIndexResponse = 
			await ElasticClient.BulkAsync(bulk => bulk.Index(group.Key).IndexMany(group.Select(member => member.JsonContent)));

		if (bulkIndexResponse.Errors)
		{
			return new HttpResponseMessage(HttpStatusCode.BadRequest)
			{
				Content = new StringContent(bulkIndexResponse.ItemsWithErrors
															 .Select(itemWithError =>
																 $"Index: {itemWithError.Index}; " +
																 $"Document Id: {itemWithError.Id}; " +
																 $"Error: {itemWithError.Error.Reason}.")
															 .ConcatenateIntoString(separator: "\n"))
			};
		}
	}
	return new HttpResponseMessage(HttpStatusCode.OK);
}

The bulk index operation succeeded, but the document IDs are unfortunately not as I expected. Here is an example:

{
	"_index": "dummyindex",
	"_type": "_doc",
	"_id": "U1W4Z20BcmiMRnw-blTi",
	"_score": 1.0,
	"_source": {
		"IndexId": "dummyindex",
		"Id": "0c2d48bd-6842-4f15-b7f2-57fa259b0642",
		"UserId": "dummy_user_1",
		"Country": "dummy_stan"
	}
}

As you can see, the Id field is 0c2d48bd-6842-4f15-b7f2-57fa259b0642, which, according to documentation, should automatically be inferred as the document ID. However, the _id field is set to U1W4Z20BcmiMRnw-blTi instead of 0c2d48bd-6842-4f15-b7f2-57fa259b0642.

What am I doing wrong?

forloop · September 26, 2019, 7:06am

The "Id" on an ExpandoObject is not a property of the type, but a key in the underlying IDictionary<string,object> that ExpandoObject is backed by.

You can see this by reflecting over the properties of ExpandoObject with

dynamic expandoObject = JsonConvert.DeserializeObject<ExpandoObject>(@"{
		""IndexId"": ""dummyindex"",
		""Id"": ""0c2d48bd-6842-4f15-b7f2-57fa259b0642"",
		""UserId"": ""dummy_user_1"",
		""Country"": ""dummy_stan""
	}
");

Type t = expandoObject.GetType();
PropertyInfo[] properties = t.GetProperties(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance);
foreach (PropertyInfo property in properties)
{
	Console.WriteLine(property.ToString());
}

which prints

System.Dynamic.ExpandoClass Class
System.Collections.Generic.ICollection`1[System.String] System.Collections.Generic.IDictionary<System.String,System.Object>.Keys
System.Collections.Generic.ICollection`1[System.Object] System.Collections.Generic.IDictionary<System.String,System.Object>.Values
System.Object System.Collections.Generic.IDictionary<System.String,System.Object>.Item [System.String]
Int32 System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<System.String,System.Object>>.Count
Boolean System.Collections.Generic.ICollection<System.Collections.Generic.KeyValuePair<System.String,System.Object>>.IsReadOnly

To solve your issue, you can specify the Id for each document however by passing the second delegate argument to .IndexMany()

dynamic expandoObject = JsonConvert.DeserializeObject<ExpandoObject>(@"{
		""IndexId"": ""dummyindex"",
		""Id"": ""0c2d48bd-6842-4f15-b7f2-57fa259b0642"",
		""UserId"": ""dummy_user_1"",
		""Country"": ""dummy_stan""
	}
");

var bulkResponse = client.Bulk(bu => bu
    .IndexMany(new[] { expandoObject }, (b, d) => b.Id((Id)d.Id))
);

The cast of d.Id to Id (or could have been string as that is the actual type, but casting to Id will use the implicit conversion from string to Id) is required because d is a dynamic type and the runtime is unable to dispatch without it.

Miao · September 26, 2019, 7:28am

Thank you for your reply! I eventually figured this out yesterday, but didn't update my post to reflect this.

I did not know this. I will try this out. Thank you so much!

system · October 24, 2019, 7:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch-hadoop: bulk indexing JSON Elasticsearch	5	551	July 6, 2017
GET all JSON docs in an index without using ID Elasticsearch	7	3083	July 5, 2017
Can NEST index multiple documents to a data stream? Elasticsearch language-clients	3	1317	December 15, 2021
Unable to index a json string (array of json objects) to elastic search Elasticsearch	8	679	March 19, 2021
NEST - Don't index certain values with bulk requests Elasticsearch	1	768	July 5, 2017

ElasticSearch NEST: Bulk-indexing operation does not make use of specified document IDs

Related topics