Having an issue with Nest and BulkDescriptor


(Doug Nelson) #1

I am having an issue with following code and it should work. I am trying to use the bulk api to perform partial updates on documents. The issue I am having is the execution hangs on the bulkRequest.Update method. Nothing errors but never returns even after several minutes,

using Nest;
using System;
using System.Collections.Generic;

namespace Example
{
    class Program
    {
        static void Main(string[] args)
        {
            var esAnalyticClient = new ElasticClient(new ConnectionSettings(new Uri("http://localhost:9200")));
            var esUpdateClient = new ElasticClient(new ConnectionSettings(new Uri("http://localhost:9200")));
            var indexToUpdate = "testing";

            var pageSize = 1000;
            var currentPage = 0;

            while (true)
            {
                var searchResponse = esAnalyticClient.Search<dynamic>(s => s
                    .Type(string.Empty)
                    .Index("user_click")
                    .Query(q => q.Match(m => m.Field("indexName").Query(indexToUpdate)))
                    .Size(pageSize)
                    .From(currentPage * pageSize));

                if (searchResponse.Documents.Count == 0)
                    break;
                currentPage++;

                var bulkDescriptor = new BulkDescriptor();
                bulkDescriptor.Index(indexToUpdate);
                bulkDescriptor.Type("doc");


                foreach (var analyticDoc in searchResponse.Documents)
                {
                    bulkDescriptor.Update<dynamic, DocPartial>(s => s
                    .Id(analyticDoc.sourceId)
                    .Doc(new DocPartial { UserClickBoost = analyticDoc.UserClickBoost}));
                }

                var updateResponse = esUpdateClient
                   .Bulk(bulkDescriptor);

                var t = updateResponse.IsValid;

            }
        }
    }
}

Having a hard time figuring this one out. Using Nest 6.1.0.


(Russ Cam) #2

Hey @Doug_Nelson1, the issue here is the use of dynamic in the .Search<TDocument>(...) call in conjunction with passing analyticDoc.sourceId for the .Id(...) of the document to update in the .Update<TDocument, TPartialDocument>(...) call.

When dynamic is specified as the type for TDocument in the .Search<TDocument>(...) call, the types returned in 6.x will actually be of type Nest.Json.Linq.JObject, the internalized version of Json.NET's JObject type. Because the document type is dynamic, any dereferencing of properties on the document will return dynamic values, so analyticDoc.sourceId will be dynamic, as will analyticDoc.UserClickBoost. The former causes a problem for NEST when trying to determine the underlying type for the id of the document to update, which is where it hangs. Ultimately, it's a constraint related to dynamic dispatch which we won't fix. It's similar to this issue and the issues linked to within it:

The fix here is pretty straightforward though if you know the types for sourceId and UserClickBoost (aside: should this property be camel cased?) that you're dealing with. Here's an example to demonstrate.

Assuming the following is indexed

PUT user_click
{
  "mappings": {
    "doc": {
      "properties": {
        "indexName": {
          "type": "keyword"
        },
        "sourceId": {
          "type": "keyword"
        },
        "userClickBoost": {
          "type": "double"
        }
      }
    }
  }
}

PUT user_click/doc/1
{
  "indexName": "testing",
  "sourceId": "2",
  "userClickBoost": 0.5
}

PUT testing/doc/2
{
  "userClickBoost": 0.2
}

Then you can do the following in NEST

private static void Main()
{

    var pool = new SingleNodeConnectionPool(new Uri("https://localhost:9200"));

    var settings = new ConnectionSettings(pool);

	var esAnalyticClient = new ElasticClient(settings);
	var esUpdateClient = new ElasticClient(settings);
	var indexToUpdate = "testing";

	var pageSize = 1000;
	var currentPage = 0;

	while (true)
	{
		var searchResponse = esAnalyticClient.Search<dynamic>(s => s
			.Type(string.Empty)
			.Index("user_click")
			.Query(q => q.Match(m => m.Field("indexName").Query(indexToUpdate)))
			.Size(pageSize)
			.From(currentPage * pageSize));

		if (searchResponse.Documents.Count == 0)
			break;
		currentPage++;

		var bulkDescriptor = new BulkDescriptor();
		bulkDescriptor.Index(indexToUpdate);
		bulkDescriptor.Type("doc");

		foreach (var analyticDoc in searchResponse.Documents)
		{
			bulkDescriptor.Update<dynamic, DocPartial>(s => s
				.Id((string)analyticDoc.sourceId)
				.Doc(new DocPartial { UserClickBoost = (double?)analyticDoc.userClickBoost })
			);
		}

		var updateResponse = esUpdateClient
		   .Bulk(bulkDescriptor);

		var t = updateResponse.IsValid;
	}
}

public class DocPartial
{
	public double? UserClickBoost { get; set; }
}

By casting sourceId and userClickBoost to the known types, the problem with dynamic dispatch can be avoided.


(Doug Nelson) #3

Thanks Russ that fix solved my issue.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.