Retrieve Search Results Based on the First Letter in a Specific Field

maryam_abdullah · June 14, 2019, 12:11pm

Hey,

Is it possible to search based on the first letter in a specified field of documents?

for example, when the query is letter 'A", all documents that have titles that start with 'A' are returned.

thanks.

wangqinghuan · June 15, 2019, 12:38pm

Using wildcard query . Here is a example:
GET /_search

{
     "query": {
        "wildcard": {
           "title": {
               "value": "A*"
                  }
              }
          }
     }

dadoonet · June 16, 2019, 4:39am

Note that this can be slow.
Another query could be: Prefix query | Elasticsearch Guide [8.11] | Elastic

If you can wait for more letters to be entered, like AB or ABC before launching the search, I'd use a edge ngram based analyzer strategy. See Edge NGram Tokenizer | Elasticsearch Guide [7.1] | Elastic

And even better/faster I'd use a completion suggester. Completion Suggester | Elasticsearch Guide [7.1] | Elastic

maryam_abdullah · June 17, 2019, 5:27am

@dadoonet No , the scenario isn't based on waiting for more letters.
I've tried both the prefix query and wildcard,
and noticed that if the field contains multiple words then all titles that have words that start with an "A" for example will be retrieved, not based on the first word in that field.

Any thoughts about how this can be solved?

dadoonet · June 17, 2019, 6:31am

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

maryam_abdullah · June 17, 2019, 6:35am

Okay.

Here's the code for creating the index with its settings and mappings:

var response = client.CreateIndex(index, c => c
                 .Mappings(ms => ms
                     .Map<Document>(m => m
                         .Properties(ps => ps
                             .Text(n => n
                                 .Name(e => e.Id))
                             .Keyword(s => s
                                 .Name(n => n.Client))
                             .Text(s => s
                                 .Name(n => n.Title)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Summary)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Content)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Tags)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Url))
                             .Text(s => s
                                 .Name(n => n.Image))
                             .Date(s => s
                                 .Name(n => n.Date))
                             .Number(s => s
                                 .Name(n => n.Importance))
                             .Text(s => s
                                 .Name(n => n.Terms)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.MainTerms)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Meta)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Type)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Info))
                             .Text(s => s
                                 .Name(n => n.Relations)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.RelationsSummary)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             )
                         ))
                 .Settings(ss => ss
                    .NumberOfShards(5)
                     .Analysis(an => an
                         .TokenFilters(tf => tf
                             .Stop("english_stop", ts => ts
                                 .StopWords("_english_"))
                             .KeywordMarker("english_keywords", km => km.KeywordsPath(keywordsFile))
                             .Stemmer("english_stemmer", st => st.Language("english"))
                             .Stemmer("english_possessive_stemmer", st => st.Language("possessive_english"))
                         )
                         .Analyzers(ns => ns
                             .Custom("english_analyzer", cm => cm
                                 .Tokenizer("standard")
                                 .Filters("english_possessive_stemmer", "lowercase", "english_stop", "english_keywords", "english_stemmer")
                             )
                             .Custom("english_standard", cm => cm
                                 .Tokenizer("standard")
                                 .Filters("english_possessive_stemmer", "lowercase")
                             ))
                     )
                 )
             );

The following is the search query:

var response = elasticClient.Search<Document>(s => s
                .From(offset)
                .Size(size)
                .Index(name)
                .Sort(so => sortByDate
                    ? so.Descending(a => a.Date).Field(f => f.Field("_score").Order(SortOrder.Descending))
                    : so.Field(f => f.Field("_score").Order(SortOrder.Descending)))
                    
                .Query(q => q
                        //m => m.Prefix(p => p.Field(f => f.Title).Boost(20).Value("a"))
                       // m => m.Wildcard(w => w.Field(f => f.Title).Value("a*")),
                      .SpanFirst(sf => sf.Match(mm => mm.SpanTerm(st => st.Field(f => f.Title).Value("a"))).End(1).Boost(100))
                                    )
            );

I tried to use span first as well, but it hasn't returned any result.

dadoonet · June 17, 2019, 6:58am

Could you provide something I can test in Kibana dev console?

maryam_abdullah · June 17, 2019, 7:06am

I will try to rewrite it in curl.

maryam_abdullah · June 17, 2019, 7:26am

PUT en
{
  "mappings": {
    "properties": {
      "Id": {
        "type": "text"
      },
      "Title": {
        "type": "text",
        "analyzer": "english_analyzer",
        "fields": {
          "exact": {
            "type": "text",
            "analyzer": "english_standard"
          }
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 5,
    "analysis": {
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        }
      },
      "analyzer": {
        "english_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
          ]
        },
        "english_standard": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase"
          ]
        }
      }
    }
  }
}

PUT en/_doc/1
  {
        "id": "de22b1fd-8383-11e9-80bf-0cc47a868a49",
        "title": "Articles"
  }


 PUT en/_doc/2
      {
            "id": "33efc673-86ab-11e9-80bf-0cc47a868a49",
            "title": "WhatsApp: Adverts coming to messaging app next year, Facebook reveals"
      }

 GET /_search
    {
        "query": {
            "span_first" : {
                "match" : {
                    "span_term" : { "Title" : "a" }
                },
                "end" : 1
            }
        }
    }

dadoonet · June 17, 2019, 8:59am

Could you add a sample document to your script?

maryam_abdullah · June 17, 2019, 9:20am

[
  {
    "id": "33efc673-86ab-11e9-80bf-0cc47a868a49",
    "title": "WhatsApp: Adverts coming to messaging app next year, Facebook reveals"
  },
  {
    "id": "f5b77d15-86a9-11e9-80bf-0cc47a868a49",
    "title": "New York City terror attack: what we know so far"
  },
  {
    "id": "de22b1fd-8383-11e9-80bf-0cc47a868a49",
    "title": "Articles"
  }
]

dadoonet · June 17, 2019, 11:38am

Is that one document? Could you please add it to your script so anyone can just copy and paste and run it from Kibana?

maryam_abdullah · June 17, 2019, 11:50am

No, these are three documents.

Yes, I will add them to the above code.

dadoonet · June 17, 2019, 12:50pm

Please read again the instructions I gave earlier. We should be able to just copy and paste the script in Kibana and run it. Here this won't work.

maryam_abdullah · June 18, 2019, 5:20am

I updated it.

dadoonet · June 20, 2019, 1:18pm

I updated it a bit. Not sure if this is what you are looking for though.

DELETE en
PUT en
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "simple"
      }
    }
  }
}

PUT en/_doc/1
{
  "title": "Articles"
}
PUT en/_doc/2
{
  "title": "Foo Articles"
}

GET /_search
{
  "query": {
    "prefix": {
      "title": "a"
    }
  }
}

maryam_abdullah · June 20, 2019, 3:06pm

I'm getting an error while executing the index creation script.
Failed to parse mapping [properties]: Root mapping definition has unsupported parameters: [title : {analyzer=simple, type=text}]
So, using a simple analyzer made the prefix search return only documents that start with an 'a' letter, in that case, it returned the doc with id 1 only, am I right?
The thing is I need to have 2 different analyzers as I stated earlier.

dadoonet · June 20, 2019, 5:33pm

The thing is I need to have 2 different analyzers as I stated earlier.

Sure. You can have multiple subfields generated at index time with a different analyzer to support another use case.

See fields | Elasticsearch Guide [8.11] | Elastic

dadoonet · June 20, 2019, 5:33pm

That's probably because you are not using 7.x.

maryam_abdullah · June 21, 2019, 6:44am

Well, the prefix query returned both documents.
I need the document with id = 1 to be returned only and not both documents.

What I meant is that the position of words matter. And the matching needs to happen based on the first word of the title field.

Topic		Replies	Views
Prefix query search words rather than sentence Elasticsearch	7	895	July 6, 2017
How to search on multiple words and need to start with the given letter Elasticsearch	5	1387	July 6, 2017
Issue with searching in fields Elasticsearch	5	316	July 6, 2017
Starts With Search Elasticsearch	4	303	July 6, 2017
Query no working Elasticsearch	8	529	July 6, 2017

Retrieve Search Results Based on the First Letter in a Specific Field

Related topics