I've indexed some documents and searched for them matching the content and the content_type (always using Elasticsearch.net + NEST).
All is working as expected except the fact that into the .net objects mapped to the ES type (TRKDocument) on the file property (of type attachment) the fields are null if set automatically by the plugin.
Here the code snippet of the search:
var a = new Nest.SearchRequest<TRKDocument>("trkindex")
{
Query = new Nest.MatchQuery
{
Query = "application",
Field = "file.content_type",
}
};
var result = client.Search<TRKDocument>(a);
Debug.WriteLine(result.Documents.FirstOrDefault<TRKDocument>().File.ContentType);
the content type returned by the debug statement is null but it correctly match the query (the query is filtering content type as expected).
If i set content_type explicitally during indexing time then is returned.
I don't understand this behavior.
How can I get the full object filled with all the properties wich are set automaticaly?
Hey @richetdan, the mapper-attachments plugin does not modify the source document sent to Elasticsearch; the extracted content and metadata are indexed into the inverted index (based on your attachment type mapping configuration), but the original source is untouched and hence why it doesn't appear in result.Documents (which maps to _source).
In order to get the extracted values, you can specify the fields that you are interested in, then obtain the values of these fields from the .Hits<T> collection on the result. For example,
Thank you very much forloop, you answer perfectly cover my question.
You have even anticipated my next questions.
I also tried to disable storing of the "_source" field and everything seems to work properly.
Is there any downside to using this approach? apart from the fact that I will not be able to trigger a complete rebuild of the inverted index?
Does it make sense using NEST instead of Elasticserach.NET for search documents?
It is fairly common to not store the base64 encoded string of the document in the index to save space, but as you say, it does mean that you'd not be able to rebuild the index from the current index source documents. You may want to also store the path to where the original document can be obtained e.g. on the file system, s3 bucket, Azure blob storage, etc though.
Completely up to you The advantage of using NEST is that all requests and responses are strongly typed, making them easier to work with, and you still have access to the low level client via client.LowLevel whenever you want to drop lower.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.