ElasticSearch Exact Word Issue

tarlok · June 2, 2016, 7:59am

Hi,

I am not able to customize elasticsearch. My text is "This is apple" when user search for "apple" it returns 1 result. I want exact sentence search and don't want any result until user search for full sentence "This is apple". I tried filter, match, terms & analyzer but not able to resolve this. Please help me.

Thanks
Tarlok

cbuescher · June 2, 2016, 10:53am

Hi,

I think your best option is to index those fields using the keyword analyzer. Otherwise, if you would use something like a phrase query and search for "this is an apple" you will also get documents containing "and this is an apple too". So for exact phrase matches, keyword analyzer (or e.g. keyword tokenizer + lowercase filter) is the way to go.

tarlok · June 2, 2016, 12:25pm

Hi Christoph,

Thanks for the update.
I tried your suggestion but I still facing same issue. Below is my code:
_Query.Query(x => x.QueryString(q => q.Query(query).Analyzer(DefaultAnalyzers.keyword)));

and configuration is:
{
"query": {
"query_string": {
"query": "apple AND CustomerID:7f78c15a-4279-4572-8890-e1e99dfdb264",
"analyzer": "keyword"
}
}
}

Could you please suggest what is wrong in the above code.

Thanks
Tarlok

cbuescher · June 2, 2016, 12:32pm

If you set the analyzer on your query only, this will only affect the query. In addition you need to add the analyzer the field mapping for your document as explained here.

tarlok · June 3, 2016, 3:45pm

Hi Christoph,

Thanks for the update.

I have understood that i need to set index as "not_analyzed". I am using plainelastic.net for this. Could you please suggest where i need to set this in mapping class. Below is stringMap code :

public class StringMap : PropertyBase<T, StringMap>
{

    public StringMap<T> TermVector(TermVector termVector)
    {
        RegisterCustomJsonMap("'term_vector': {0}", termVector.AsString().Quotate());
        return this;
    }


    /// <summary>
    /// Defines if norms should be omitted or not. 
    /// </summary>
    public StringMap<T> OmitNorms(bool omitNorms = false)
    {
        RegisterCustomJsonMap("'omit_norms': {0}", omitNorms.AsString());
        return this;
    }

    /// <summary>
    /// Defines if term freq and positions should be omitted.
    /// </summary>
    public StringMap<T> OmitTermFreqAndPositions(bool omitTermFreqAndPositions = false)
    {
        RegisterCustomJsonMap("'omit_term_freq_and_positions': {0}", omitTermFreqAndPositions.AsString());
        return this;
    }
    
    /// <summary>
    /// The analyzer used to analyze the text contents when analyzed during indexing and when searching using a query string. Defaults to the globally configured analyzer.
    /// see: http://www.elasticsearch.org/guide/reference/index-modules/analysis/
    /// </summary>
    public StringMap<T> Analyzer(string analyzer)
    {
        RegisterCustomJsonMap("'analyzer': {0}", analyzer.Quotate());
        return this;
    }

    /// <summary>
    /// The analyzer used to analyze the text contents when analyzed during indexing and when searching using a query string. Defaults to the globally configured analyzer.
    /// see: http://www.elasticsearch.org/guide/reference/index-modules/analysis/
    /// </summary>
    public StringMap<T> Analyzer(DefaultAnalyzers analyzer)
    {
        return Analyzer(analyzer.AsString());
    }        

    /// <summary>
    /// The analyzer used to analyze the text contents when analyzed during indexing.
    /// see: http://www.elasticsearch.org/guide/reference/index-modules/analysis/ 
    /// </summary>       
    public StringMap<T> IndexAnalyzer(string analyzer)
    {
        RegisterCustomJsonMap("'index_analyzer': {0}", analyzer.Quotate());
        return this;
    }

    /// <summary>
    /// The analyzer used to analyze the text contents when analyzed during indexing.
    /// see: http://www.elasticsearch.org/guide/reference/index-modules/analysis/ 
    /// </summary>       
    public StringMap<T> IndexAnalyzer(DefaultAnalyzers analyzer)
    {
        return IndexAnalyzer(analyzer.AsString());
    }

    /// <summary>
    /// The analyzer used to analyze the field when part of a query string.
    /// see: http://www.elasticsearch.org/guide/reference/index-modules/analysis/
    /// </summary>
    public StringMap<T> SearchAnalyzer(string analyzer)
    {
        RegisterCustomJsonMap("'search_analyzer': {0}", analyzer.Quotate());
        return this;
    }

    /// <summary>
    /// The analyzer used to analyze the field when part of a query string.
    /// see: http://www.elasticsearch.org/guide/reference/index-modules/analysis/
    /// </summary>
    public StringMap<T> SearchAnalyzer(DefaultAnalyzers analyzer)
    {
        return SearchAnalyzer(analyzer.AsString());
    }



    protected override string GetElasticFieldType(Type fieldType)
    {
        return "string";
    }
}

Please suggest.

Thanks
Tarlok

tarlok · June 7, 2016, 3:39pm

Hi Christoph,

Below is default mapping :

{
"twitter" : {
"mappings" : {
"tweet" : {
"properties" : {
"Date" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"Message" : {
"type" : "string"
},
"User" : {
"type" : "string"
}
}
}
}
}
}

My property class is :
namespace PlainSample
{
public class Tweet
{
[Index("not_analyzed")]
public string User { get; set; }
[Index("not_analyzed")]
public string Message { get; set; }
[Index("not_analyzed")]
public DateTime Date { get; set; }

}

public class IndexAttribute : System.Attribute
{
    public string AnalyzationMethod { get; set; }
    public IndexAttribute(string analyzationMethod)
    {
        AnalyzationMethod = analyzationMethod;
    }
}

}

Index setting is:

var indexSettings = new IndexSettingsBuilder()
.NumberOfShards(8)
.NumberOfReplicas(1)
.Analysis(analysis => analysis
.Analyzer(analyzer => analyzer
.Custom("keyword_lowercase", custom => custom
.Tokenizer(DefaultTokenizers.standard)
.Filter(DefaultTokenFilters.lowercase))));

        client.CreateIndex(new IndexCommand(index: "twitter"), indexSettings);

We are not able to set property attribute "not_analyzed" like below:
"Message" : {
"type" : "string",
"index": "not_analyzed"
},

Please help me. This is urgent for me.

Thanks
Tarlok

forloop · June 8, 2016, 4:13am

If you're using PlainElastic.Net, I would recommend asking the authors of that library about how to change the analyzer on a field mapping. It is not a library that Elastic maintains.

With NEST, the official high level .NET client for Elasticsearch, the following would create an index with the desired mappings (using NEST 2.3.2 targeting Elasticsearch 2.x)

void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));

    // map Tweet type to index "twitter". This will be the default index used
    // when the type in the search is Tweet and no index is specified on the request
    var connectionSettings = new ConnectionSettings(pool)
            .MapDefaultTypeIndices(d => d.Add(typeof(Tweet), "twitter"));
                
    var client = new ElasticClient(connectionSettings);

    var createIndexResponse = client.CreateIndex("twitter", c => c
        .Settings(s => s
            .NumberOfShards(8)
            .NumberOfReplicas(1)
        )
        .Mappings(m => m
            .Map<Tweet>(tm => tm
                // infer mapping for all properties
                .AutoMap()
                // now override any inferred mappings that we want to change
                .Properties(p => p
                    // set "user" field as a multi_field:
                    // - "user" will be indexed using the standard analyzer
                    // - "user.keyword" will be indexed using the keyword analyzer
                    // - "user.raw" will be indexed with analysis
                    .String(s => s
                        .Name(n => n.User)
                        .Fields(f => f
                            .String(sf => sf
                                .Name("keyword")
                                .Analyzer("keyword")
                            )
                            .String(sf => sf
                                .Name("raw")
                                .NotAnalyzed()
                            )
                        )
                    )
                    // similarly, set "message" field as a multi_field:
                    .String(s => s
                        .Name(n => n.Message)
                        .Fields(f => f
                            .String(sf => sf
                                .Name("keyword")
                                .Analyzer("keyword")
                            )
                            .String(sf => sf
                                .Name("raw")
                                .NotAnalyzed()
                            )
                        )
                    )
                )
            )
        )
    );

    // index these three documents
    var tweets = new[] {
        new Tweet
        {
            User = "forloop",
            Message = "This is an apple",
            Date = new DateTime(2016, 6, 8, 13, 52, 0)
        },
        new Tweet
        {
            User = "mpdreamz",
            Message = "This is an apple in my hand",
            Date = new DateTime(2016, 6, 8, 13, 50, 0)
        },
        new Tweet
        {
            User = "gregmarzouka",
            Message = "This is not an apple",
            Date = new DateTime(2016, 6, 8, 13, 54, 0)
        },
    };

    var bulkResponse = client.Bulk(b => b
        .IndexMany(tweets, (i, d) => i.Document(d))
        .Refresh()
    );

    // match query on "message.keyword" field
    var searchResonse = client.Search<Tweet>(s => s
        .Query(q => q
            .Match(m => m
                .Field(f => f.Message.Suffix("keyword"))
                .Query("This is an apple")
            )
        )
    );
    
    Console.WriteLine("Found {0} matching document(s)", searchResonse.Total);
    Console.WriteLine("Matching document users: {0}", string.Join(",", searchResonse.Documents.Select(d => d.User)));
}

public class Tweet
{
    public string User { get; set; }
    public string Message { get; set; }
    public DateTime Date { get; set; }
}

The result of running the match query is

Found 1 matching document(s)
Matching document users: forloop

as expected.

forloop · June 8, 2016, 4:13am

You may not want to map fields as multi_fields as in the above example, in which case you can simply do the following to index User and Message using the keyword analyzer:

var createIndexResponse = client.CreateIndex("twitter", c => c
    .Settings(s => s
        .NumberOfShards(8)
        .NumberOfReplicas(1)
    )
    .Mappings(m => m
        .Map<Tweet>(tm => tm
            .AutoMap()
            .Properties(p => p
                .String(s => s
                    .Name(n => n.User)
                    .Analyzer("keyword")
                )
                .String(s => s
                    .Name(n => n.Message)
                    .Analyzer("keyword")
                )
            )
        )
    )
);

Check out the auto mapping documentation for more details and examples.

tarlok · June 9, 2016, 12:39pm

Hi Russ,

Thanks for your help. I am able to create map setting in plain elastic search and exact word search is working fine now. I want to activate both exact word as well full text search.
e.g. My data field is like : This is apple.
If user search for "apple" then there should not be any result but if user search for "apple*" then there must be one record (full text search). If user search for "This is apple" then there must be 1 record (exact word search).

These two features are working fine independently full-text with "analyzed" index and exact word with "not_analyzed" index. Now I want activate both feature together. I have found multi-field concept in elastic search. Is multi-filed resolve my problem. Could you please help me to make search full-text and exact word simultaneously.

Thanks
Tarlok

forloop · June 14, 2016, 10:45am

Take a look at bool queries to combine multiple queries together.

tarlok · June 14, 2016, 12:24pm

Hi Russ,

Thanks for your help. I already did this using bool queries . Once again thanks a lot for your help.

Thanks
Tarlok

tarlok · June 15, 2016, 3:24pm

Hi Russ,

I am facing a new issue. Must is not working with wildcard search.

Below is my query:
"query": {
"bool": {
"should": [{
"wildcard": {
"Manufacturer": {
"value": "apple"
}
}
}
}]
}
}
This is working fine and displaying all data containing apple. This is for all users.

Now I am adding customerid in query and want to display data of particular user:
"query": {
"bool": {
"must": [{
"wildcard": {
"CustomerID": {
"value": "7f78c15a-4279-4572-8890-e1e99dfdb264"
}
}
}],
"should": [{
"wildcard": {
"Manufacturer": {
"value": "apple"
}
}
}
}]
}
}

It starts displaying all data of particular customer and it is not considering apple in the search. Please help me.

Thanks
Tarlok

Topic		Replies	Views
Exact phrase matching in any word order, but restricts words not in query Elasticsearch	3	1104	July 6, 2017
Search with exact phrase in my index Elasticsearch	8	207	February 14, 2024
Search/Filter on exact keyword with email value Elasticsearch	3	733	July 28, 2021
How to do exact term matching? Elasticsearch	3	332	July 6, 2017
Advice on mapping/searching with lowercase Elasticsearch	2	376	July 6, 2017

ElasticSearch Exact Word Issue

Related topics