Query with a character on any position on few fields


(Michał Gąsior) #1

I'm pretty new to ElasticSearch and I have some difficulties making the right query in NEST. My data model looks like this:

[ElasticsearchType(Name = "examination")]
public class Project
{
    public Guid Id { get; set; }
    public bool IsOpen { get; set; }

    public string ExternalCode { get; set; }
    public Owner Owner { get; set; }
}

public class Owner
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

My problem is that I'd like to make a query which would search the ExternalCode, FirstName and LastName using any position the search phrase may appear (ex. when I write "LE" then I am able to find not only "Leonard" from the first name but also "Apple" from the last name and "56LO3LE56" from the external code property).

I tried adding the [Keyword] attribute while mapping in Nest and using the "match_phrase_prefix" as in "Search while type" in documentation, but it doesn't work. Should I use here the RegExp query?

{
  "query": { 
	"bool": { 
	  "should": [
		{ "match_phrase_prefix": { "externalCode":   "LE"        }},
		{ "match_phrase_prefix": { "owner.firstName":   "LE"        }},
		{ "match_phrase_prefix": { "owner.lastName":   "LE"        }}
	  ],
	  "filter": [
		{ "term":  { "isOpen": "true" }}
	  ]
	}
  }
}

(Michał Gąsior) #3

This is my current querry - still working on connecting all of the fields into one search:

{
  "query": { 
	"bool": { 
	  "must": [
		{ "regexp": { "externalCode": { "value": ".*LE.*" } } }
	  ],
	  "filter": [
		{ "term":  { "isOpen": "true" }},
	  ]
	}
  }
}

I also tried the wildcard:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-wildcard-query.html

{
	"query": {
		"wildcard" : { "externalCode" : "*LE*" }
	}
}

Still the problem is with finding stuff being the full phrase - for example Leonard doesn't work for RegExp and the wildcard...


(David Pilato) #4
{
	"query": {
		"wildcard" : { "externalCode" : "*LE*" }
	}
}

One of the worst thing you can do with an inverted index.

You should think about the index process instead of the search process. I mean that basically you should just pass what the user enters to the search API. And a user will never use wildcards. At least on google I guess no one is using wildcards, right?

If you have to index Leonard and want to search for le or LE or Le then you should index actually Leonard as:

  • le
  • leo
  • leon
  • ...
  • leonard

Then when the user enters le, le will match le which is in the inverted index.

All what I described can be done with edge ngram based analyzers.

Full guide here: https://www.elastic.co/guide/en/elasticsearch/guide/2.x/_ngrams_for_partial_matching.html

HTH


(Michał Gąsior) #6

I found a solution based on Your hints and a StackOverflow question. My solution is unfortunately Nest only (but I think it's not a big deal to get the rests out).

const string ELASTIC_SEARCH_SERVER_URI = @"http://localhost:9200";
const string INDEX_NAME = "my_projects";
const string DEAFULT_INDEX_NAME = "something_index";

var uri = new Uri(ELASTIC_SEARCH_SERVER_URI);

var settings = new ConnectionSettings(uri)
	.DefaultIndex(DEAFULT_INDEX_NAME)
	.InferMappingFor<Project>(d => d
		.IndexName(USERS_INDEX_NAME)
	);
	
var client = new ElasticClient(settings);

client.CreateIndex(INDEX_NAME, descriptor => descriptor
	.Mappings(ms => ms
		.Map<Project>(m => m.AutoMap())
	)
	.Settings(s => s
		.Analysis(a => a
			.Analyzers(analyzer => analyzer
				.Custom("substring_analyzer", analyzerDescriptor => analyzerDescriptor
					.Tokenizer("keyword")
					.Filters("lowercase", "substring")
				)
			)
			.TokenFilters(tf => tf
				.NGram("substring", filterDescriptor => filterDescriptor
					.MinGram(1)
					.MaxGram(9)
				)
			)
		)
	)
);

ISearchResponse<Project> result = _client.Search<Project>(request => request
	.From(0).Size(100)
	.Query(q => q.Term(p => p.ExternalCode, "LE"))
	.PostFilter(bm => bm.Bool(b => b.Must(GetTerms(query)))));

var foundProjects = result.Documents;

And the term building method:

private Func<QueryContainerDescriptor<Project>, QueryContainer>[] GetTerms(ProjectsQuery query)
{
	var terms = new List<Func<QueryContainerDescriptor<Examination>, QueryContainer>>();

	if (query.IsOpen != null)
	{
		terms.Add(bm => bm.Term(p => p.IsOpen, query.IsOpen));
	}

	if (query.IsDeployed != null)
	{
		terms.Add(bm => bm.Term(p => p.IsDeployed, query.IsDeployed));
	}

	if (!terms.Any())
	{
		terms.Add(bm => bm.MatchAll());
	}

	return terms.ToArray();
}

And the model:

[ElasticsearchType(Name = "project")]
public class Project
{
	public Guid Id { get; set; }
	public bool IsOpen { get; set; }
	public bool IsDeployed { get; set; }
            
    [Text(Analyzer = "substring_analyzer")]
	public string ExternalCode { get; set; }
	public Owner Owner { get; set; }
}

public class Owner
{
	public string FirstName { get; set; }
	public string LastName { get; set; }
}

It works, but I'm not sure if the filters are being created properly. The last question is - how to make the Term search also through Owner.FirstName and Owner.LastName?

UPDATE: This is my current solution for multiple-fields term search

ISearchResponse<Project> result = _client.Search<Project>(request => request
	.From(0).Size(5).Query(q => q
		.Bool(b => b
			.Should(
				s => s.Term(p => p.ExternalCode, query.SearchPhrase),
				s => s.Term(p => p.Owner.FirstName, query.SearchPhrase),
				s => s.Term(p => p.Owner.LastName, query.SearchPhrase)
				)
			))
	.PostFilter(bm => bm.Bool(b => b.Must(GetTerms(query)))));

I modified the Owner class by adding the same [Text(Analyzer="substring_analyzer)] attribute over the FirstName and LastName. Now the problem is, how to make it possible to search both - the name and the surname. It works as a "search-while-type", but when I type the whole name and the beginning of the last name, it breaks.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.