How do I map a meta.title field to search within it using NEST

I have a Index created by FSCRAWLER of a document (either HTML or PDF).
This index has many (what I assume are) auto created fields.

My needs are simple, I need to search withing the ID of the document (which) will be using the filename for the ID and MORE importantly, I also I need to search within the TITle meta field. (meta.title or meta.raw.title)

Im using entity framework and NEST api, so I assume I need to create a mapping so I can search within the mapped field.

Im currently just testing this ability before going further, So I just have a basic search and map class. Index named "data" of type "_doc"

var settings = new ConnectionSettings(new Uri("http://localhost:9200")).DefaultIndex("data").DefaultTypeName("_doc");
var client = new ElasticClient(settings);
var searchResponse = client.Search<Data>(s => s
         .Query(q => q
             .Match(m => m
                .Field(f => f.Content)
                .Query("mySearchText")
             )
        )
    );

This search works fine in the Content filed (its mapped in my class)
and this is my simple mapping class...

 [ElasticsearchType(IdProperty = "_Id")]
    public class Data
    {
        public string _Id { get; set; }
        public string Content { get; set; }
        public string Title {get; set;}
    }

Above I tried grabbing the IdProperty to see if I could search within it, and it did not work.

I need some help, tried the NEST documentation but I think I need a real world example. Been searching for days and this is holding me up.

Maybe I can modify the Indexing settings on FSCRAWLER to index this meta field differently so its easier to map? Any help would be MUCH appreciated. TIA!

Heres a Document sample JSON

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 11,
"successful": 11,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [
{
"_index": "data",
"_type": "_doc",
"_id": "inter-office memo.pdf",
"_score": 1,
"_source": {
"content": " Inter-Office memo Fecha: 10/10/2007 ",
"meta": {
"author": "Eric",
"title": "Fecha: 10/10/2007",
"date": "2007-11-10T08:45:29.000+0000",
"language": "es",
"format": "application/pdf; version=1.4",
"creator_tool": "Acrobat PDFMaker 8.1 for Word",
"created": "2007-11-10T08:45:27.000+0000",
"raw": {
"date": "2007-11-10T04:45:29Z",
"pdf:PDFVersion": "1.4",
"pdf:docinfo:title": "Fecha: 10/10/2007",
"xmp:CreatorTool": "Acrobat PDFMaker 8.1 for Word",
"Company": "Microsoft Corporation",
"access_permission:modify_annotations": "true",
"access_permission:can_print_degraded": "true",
"dc:creator": "Eric",
"dcterms:created": "2007-11-10T04:45:27Z",
"Last-Modified": "2007-11-10T04:45:29Z",
"dcterms:modified": "2007-11-10T04:45:29Z",
"dc:format": "application/pdf; version=1.4",
"title": "Fecha: 10/10/2007",
"xmpMM:DocumentID": "uuid:89fe70a8-cdbf-4e1e-a799-8c08081e0f77",
"Last-Save-Date": "2007-11-10T04:45:29Z",
"pdf:docinfo:creator_tool": "Acrobat PDFMaker 8.1 for Word",
"access_permission:fill_in_form": "true",
"pdf:docinfo:modified": "2007-11-10T04:45:29Z",
"meta:save-date": "2007-11-10T04:45:29Z",
"pdf:encrypted": "false",
"dc:title": "Fecha: 10/10/2007",
"modified": "2007-11-10T04:45:29Z",
"pdf:docinfo:custom:SourceModified": "D:20071110044442",
"Content-Type": "application/pdf",
"pdf:docinfo:creator": "Eric",
"X-Parsed-By": "org.apache.tika.parser.pdf.PDFParser",
"creator": "Eric",
"meta:author": "Eric",
"meta:creation-date": "2007-11-10T04:45:27Z",
"created": "2007-11-10T04:45:27Z",
"access_permission:extract_for_accessibility": "true",
"access_permission:assemble_document": "true",
"xmpTPg:NPages": "1",
"Creation-Date": "2007-11-10T04:45:27Z",
"resourceName": "inter-office memo.pdf",
"access_permission:extract_content": "true",
"pdf:docinfo:custom:Company": "Microsoft Corporation",
"access_permission:can_print": "true",
"SourceModified": "D:20071110044442",
"Author": "Eric",
"producer": "Acrobat Distiller 8.1.0 (Windows)",
"access_permission:can_modify": "true",
"pdf:docinfo:producer": "Acrobat Distiller 8.1.0 (Windows)",
"pdf:docinfo:created": "2007-11-10T04:45:27Z"
}
},
"file": {
"extension": "pdf",
"content_type": "application/pdf",
"created": "2019-02-24T06:10:14.851+0000",
"last_modified": "2007-11-10T08:45:30.000+0000",
"last_accessed": "2019-02-24T06:10:14.851+0000",
"indexing_date": "2019-02-25T01:00:37.101+0000",
"filesize": 12396,
"filename": "inter-office memo.pdf",
"url": "file://\tmp\es\inter-office memo.pdf"
},
"path": {
"root": "3390d1be31e78ad623165b095e7dc7",
"virtual": "/inter-office memo.pdf",
"real": "\tmp\es\inter-office memo.pdf"
}
}
}

Anyone?
Even if the sample its not using C# code?

Not sure I understand but here is a query:

GET /_search
{
    "query": {
        "match" : {
            "meta.title" : "fecha"
        }
    }
}

I'm not sure I understand the question though.

Thanks,

Basically I'm trying to query ES in .net via NEST high level client or even ElasticSearch.net low level client.

Trying to convert that query into C# in a full working test sample which must include the Mapping Class also. (I assume)

Creating the Mapping Class to that meta field is my main problem.

THanks!

Then I don't think I can help "without using C# code". Which I don't know :slight_smile: ...

Note that from FSCrawler point of view, you can disable extracting raw metadata and just map all the "standard" attributes. So you don't have to think about meta.raw as it won't be produced.

HTH

1 Like

:grinning:

I think this will be my solution!
I will look into this option.

I just hope I can Extract the TITLE of the document with is in the meta data while indexing, it only lies in that field.

I will look into an FSCRAWLER solution, or even a LOGSTASH solution next.
For the moment I will continue with my application development without a "Document Title Search" option until its solved.

Thanks @dadoonet

Yes. The meta.title field is based on all possible combination of the raw meta data.

So if a metadata field is something like dc:title or title, its content is written into meta.title. That's what Tika does behind the scene for "standard" fields.

To access multi-fields in Elasticsearch with NEST, you can use the .Suffix("suffix") extension method. For example, given the following POCOs

public class Person 
{
	public IEnumerable<Tag> Tags {get;set;}
}

public class Tag 
{
	public int Id {get;set;}	
	public string Title {get;set;}
	public DateTime Date {get;set;}
}

where Tags on Person is mapped as a nested datatype,

var client = new ElasticClient();

var searchResponse = client.Search<Person>(s => s
	.From(0)
	.Size(15)
	.Query(q => q
		.Nested(n => n
			.Path(p => p.Tags)
			.Query(nq => nq
				.Match(m => m
					.Field(f => f.Tags.First().Title.Suffix("raw"))
					.Query("this is my match query")
				)
			)
		)
	)
);

will produce the following query:

{
  "from": 0,
  "query": {
    "nested": {
      "path": "tags",
      "query": {
        "match": {
          "tags.title.raw": {
            "query": "this is my match query"
          }
        }
      }
    }
  },
  "size": 15
}

Notice that the field resolved from the expression f => f.Tags.First().Title.Suffix("raw") is "tags.title.raw"

Thanks!

I will be testing this and getting back to you if I can get it to work :slight_smile:

This look like what I was looking for.

Well I tried and tried, but still get 0 results in search.
NOTE*** I'm using Elasticsearch 6.6.1 and .Net Framework (not net core).

Here is my code derived from your code samples.

 public class Data
    {
        public IEnumerable<Tag> Meta { get; set; }
        public string Content { get; set; }
    }

    public class Tag
    {
        public string Title { get; set; }
    }

where Meta on Data is mapped as a nested datatype,

var searchResponse = client.Search<Data>(s => s
        .From(0)
        .Size(15)
        .Query(q => q
                .Nested(n => n
                        .Path(p => p.Meta)
                        .Query(nq => nq
                                .Match(m => m
                                    .Field(f => f.Meta.First().Title)
                                    .Query("this is my match query")
                                )
                        )
                )
        )
);

            TextBox1.Text = searchResponse.Documents.Count.ToString();

The elasticsearch.log showed a failed attempt to create this Query:

org.elasticsearch.transport.RemoteTransportException: [HOME-PC][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.index.query.QueryShardException: failed to create query: {
  "nested" : {
    "query" : {
      "match" : {
        "meta.title" : {
          "query" : "Fecha",
          "operator" : "OR",
          "prefix_length" : 0,
          "max_expansions" : 50,
          "fuzzy_transpositions" : true,
          "lenient" : false,
          "zero_terms_query" : "NONE",
          "auto_generate_synonyms_phrase_query" : true,
          "boost" : 1.0
        }
      }
    },
    "path" : "meta",
    "ignore_unmapped" : false,
    "score_mode" : "avg",
    "boost" : 1.0
  }
}

Did Notice this error...

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.IllegalStateException: [nested] nested object under path [meta] is not of nested type

Also tried variants using the Field meta.raw.title and meta.raw.dc:title to similar results.

This indicates that meta is not mapped as a nested datatype. Can you show what the mapping in the target index looks like? You can call the Get Mapping API to retrieve this.

Sure, but it has over 21,000 characters and goes over the 7,000 characters allowed in this post, So I will post a link to a .TXT file containing the mapping.

My Mapping.txt Link

The mapping of meta is object datatype so the query does not need to be wrapped in a nested query. Something like the following would work

var client = new ElasticClient();

var searchResponse = client.Search<Person>(s => s
	.From(0)
	.Size(15)
	.Query(q => q
		.Match(m => m
		    .Field("meta.title")
		    .Query("this is my match query")
		)
	)
);

If I'm not mistaking (Im out of work station)
In order for me to use (meta.title) as a field I must define it in a CLASS and I dont know how to do that. I will try using Quotes "meta.title" I dont think I've tried that...

I think I tried defining meta in my Data class (in your sample would be Persons Class) and using meta.suffix("title") but i did not work.

I will post my tests, a bit later tonight when Im in the station.

THANKS FOR ALL YOUR HELP!

You would just need to define a POCO that has a meta property of another POCO that has a title property. Something like

public class MyDocument
{
    public Meta Meta { get; set; }
}

public class Meta
{
    public string Title { get; set; }
}

Then use this type in Search<T>()

var client = new ElasticClient();

var searchResponse = client.Search<MyDocument>(s => s
	.Query(q => q
		.Match(m => m
		    .Field(f => f.Meta.Title)
		    .Query("this is my match query on meta.title")
		)
	)
);

YES!
This worked perfectly!

Thanks a million, this should allow me to search on other meta fields I need.

Thanks again for taking the time to help me on this!
:clap::clap::clap::clap:

Thanks again!

Since I have your attention,
If I wanted to search the keyword field (file.filename) with same index and data here.

For some reason this solution will not work for the (file.) object like it does for the (meta.) object.

 public class Data
{
    public string Content { get; set; }
    public Metas Meta { get; set; }
    public Files File { get; set; }
}

public class Metas
{
    public string Title { get; set; }
    public string Author { get; set; }
    public string Language { get; set; }
}
public class Files
{
    public string Filename { get; set; }
}      



 var searchResponse = client.Search<Data>(s => s
        .From(0)
        .Size(15)
         .Query(q => q            
          .Match(m => m   
             .Field(f => f.File.Filename) 
                .Query("test")
             )
        )

   );

If you have the time... THanks anyways for solving the meta query.

Should I open a new thread for this?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.