Mapper Attachments Plugin with .NET client

Hi all, I'm new to Elasticserch and Mapper Attachment Plugin.
I am using both .net clients, mixing them: Elasticsearch.net and NEST

I've created and indexed with mapping using the following REST command:

POST /trkindex
{
    "mappings":{
        "trkdocument":
            {"properties":
                {"file":
                    {"type":"attachment",
                            "fields" :  {
                              "content": {
                                "type": "string",
                                "term_vector":"with_positions_offsets",
                                "store": true
                              },
                              "content_type" : {"store" : "yes"}
                            }
                    }
                }
            }
    },
    "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 1 }}
}

I've indexed some documents and searched for them matching the content and the content_type (always using Elasticsearch.net + NEST).

All is working as expected except the fact that into the .net objects mapped to the ES type (TRKDocument) on the file property (of type attachment) the fields are null if set automatically by the plugin.
Here the code snippet of the search:

    var a = new Nest.SearchRequest<TRKDocument>("trkindex")
    {
        
        Query = new Nest.MatchQuery
        {
            
            Query = "application",
            Field = "file.content_type",               
        }
        
    };

    var result = client.Search<TRKDocument>(a);
    Debug.WriteLine(result.Documents.FirstOrDefault<TRKDocument>().File.ContentType);

the content type returned by the debug statement is null but it correctly match the query (the query is filtering content type as expected).
If i set content_type explicitally during indexing time then is returned.
I don't understand this behavior.
How can I get the full object filled with all the properties wich are set automaticaly?

Thanks in advance

-Daniele-

Hey @richetdan, the mapper-attachments plugin does not modify the source document sent to Elasticsearch; the extracted content and metadata are indexed into the inverted index (based on your attachment type mapping configuration), but the original source is untouched and hence why it doesn't appear in result.Documents (which maps to _source).

In order to get the extracted values, you can specify the fields that you are interested in, then obtain the values of these fields from the .Hits<T> collection on the result. For example,

var searchResponse = Client.Search<Document>(s => s
	.Fields(f => f
                // fields you're interested in
		.Field(d => d.Attachment.Name)
		.Field(d => d.Attachment.Author)
		.Field(d => d.Attachment.Content)
		.Field(d => d.Attachment.ContentLength)
		.Field(d => d.Attachment.ContentType)
		.Field(d => d.Attachment.Date)
		.Field(d => d.Attachment.Keywords)
		.Field(d => d.Attachment.Language)
		.Field(d => d.Attachment.Title)
	)
	.Query(q => q
		.MatchAll()
	)
);

and then

var documents = new List<Document>();

foreach (var hit in searchResponse.Hits)
{
	var document = new Document { Attachment = new Nest.Attachment() };
	document.Attachment.Name = hit.Fields.ValueOf<Document, string>(d => d.Attachment.Name);
	document.Attachment.Author = hit.Fields.ValueOf<Document, string>(d => d.Attachment.Author);
	document.Attachment.Content = hit.Fields.ValueOf<Document, string>(d => d.Attachment.Content);
	document.Attachment.ContentLength = hit.Fields.ValueOf<Document, long?>(d => d.Attachment.ContentLength);
	document.Attachment.ContentType = hit.Fields.ValueOf<Document, string>(d => d.Attachment.ContentType);
	document.Attachment.Date = hit.Fields.ValueOf<Document, DateTime?>(d => d.Attachment.Date);
	document.Attachment.Keywords = hit.Fields.ValueOf<Document, string>(d => d.Attachment.Keywords);
	document.Attachment.Language = hit.Fields.ValueOf<Document, string>(d => d.Attachment.Language);
	document.Attachment.Title = hit.Fields.ValueOf<Document, string>(d => d.Attachment.Title);
	documents.Add(document);
}

in this example, I populate a collection of types from values in the .Hits<T> collecton, but you may do something different.

An Attachment type was added in Nest 2.3.3 to make working with the mapper-attachments plugin easier with NEST; it's not included in the documentation yet, but take a look at the tests for it to see how to use it.

Thank you very much forloop, you answer perfectly cover my question.
You have even anticipated my next questions.

I also tried to disable storing of the "_source" field and everything seems to work properly.
Is there any downside to using this approach? apart from the fact that I will not be able to trigger a complete rebuild of the inverted index?

Does it make sense using NEST instead of Elasticserach.NET for search documents?

-Daniele-

It is fairly common to not store the base64 encoded string of the document in the index to save space, but as you say, it does mean that you'd not be able to rebuild the index from the current index source documents. You may want to also store the path to where the original document can be obtained e.g. on the file system, s3 bucket, Azure blob storage, etc though.

Completely up to you :smile: The advantage of using NEST is that all requests and responses are strongly typed, making them easier to work with, and you still have access to the low level client via client.LowLevel whenever you want to drop lower.