Retrieve Search Results Based on the First Letter in a Specific Field

Hey,

Is it possible to search based on the first letter in a specified field of documents?

for example, when the query is letter 'A", all documents that have titles that start with 'A' are returned.

thanks.

Using wildcard query . Here is a example:
GET /_search

{
     "query": {
        "wildcard": {
           "title": {
               "value": "A*"
                  }
              }
          }
     }

Note that this can be slow.
Another query could be: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html

If you can wait for more letters to be entered, like AB or ABC before launching the search, I'd use a edge ngram based analyzer strategy. See https://www.elastic.co/guide/en/elasticsearch/reference/7.1/analysis-edgengram-tokenizer.html

And even better/faster I'd use a completion suggester. https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-suggesters-completion.html

@dadoonet No , the scenario isn't based on waiting for more letters.
I've tried both the prefix query and wildcard,
and noticed that if the field contains multiple words then all titles that have words that start with an "A" for example will be retrieved, not based on the first word in that field.

Any thoughts about how this can be solved?

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

Okay.

Here's the code for creating the index with its settings and mappings:

var response = client.CreateIndex(index, c => c
                 .Mappings(ms => ms
                     .Map<Document>(m => m
                         .Properties(ps => ps
                             .Text(n => n
                                 .Name(e => e.Id))
                             .Keyword(s => s
                                 .Name(n => n.Client))
                             .Text(s => s
                                 .Name(n => n.Title)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Summary)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Content)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Tags)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Url))
                             .Text(s => s
                                 .Name(n => n.Image))
                             .Date(s => s
                                 .Name(n => n.Date))
                             .Number(s => s
                                 .Name(n => n.Importance))
                             .Text(s => s
                                 .Name(n => n.Terms)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.MainTerms)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Meta)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Type)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.Info))
                             .Text(s => s
                                 .Name(n => n.Relations)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             .Text(s => s
                                 .Name(n => n.RelationsSummary)
                                 .Analyzer("english_analyzer")
                                 .Fields(f => f.Text(t => t.Analyzer("english_standard").Name("exact"))))
                             )
                         ))
                 .Settings(ss => ss
                    .NumberOfShards(5)
                     .Analysis(an => an
                         .TokenFilters(tf => tf
                             .Stop("english_stop", ts => ts
                                 .StopWords("_english_"))
                             .KeywordMarker("english_keywords", km => km.KeywordsPath(keywordsFile))
                             .Stemmer("english_stemmer", st => st.Language("english"))
                             .Stemmer("english_possessive_stemmer", st => st.Language("possessive_english"))
                         )
                         .Analyzers(ns => ns
                             .Custom("english_analyzer", cm => cm
                                 .Tokenizer("standard")
                                 .Filters("english_possessive_stemmer", "lowercase", "english_stop", "english_keywords", "english_stemmer")
                             )
                             .Custom("english_standard", cm => cm
                                 .Tokenizer("standard")
                                 .Filters("english_possessive_stemmer", "lowercase")
                             ))
                     )
                 )
             );

The following is the search query:

var response = elasticClient.Search<Document>(s => s
                .From(offset)
                .Size(size)
                .Index(name)
                .Sort(so => sortByDate
                    ? so.Descending(a => a.Date).Field(f => f.Field("_score").Order(SortOrder.Descending))
                    : so.Field(f => f.Field("_score").Order(SortOrder.Descending)))
                    
                .Query(q => q
                        //m => m.Prefix(p => p.Field(f => f.Title).Boost(20).Value("a"))
                       // m => m.Wildcard(w => w.Field(f => f.Title).Value("a*")),
                      .SpanFirst(sf => sf.Match(mm => mm.SpanTerm(st => st.Field(f => f.Title).Value("a"))).End(1).Boost(100))
                                    )
            );

I tried to use span first as well, but it hasn't returned any result.

Could you provide something I can test in Kibana dev console?

I will try to rewrite it in curl.

PUT en
{
  "mappings": {
    "properties": {
      "Id": {
        "type": "text"
      },
      "Title": {
        "type": "text",
        "analyzer": "english_analyzer",
        "fields": {
          "exact": {
            "type": "text",
            "analyzer": "english_standard"
          }
        }
      }
    }
  },
  "settings": {
    "number_of_shards": 5,
    "analysis": {
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        }
      },
      "analyzer": {
        "english_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "english_stop",
            "english_stemmer"
          ]
        },
        "english_standard": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase"
          ]
        }
      }
    }
  }
}

PUT en/_doc/1
  {
        "id": "de22b1fd-8383-11e9-80bf-0cc47a868a49",
        "title": "Articles"
  }


 PUT en/_doc/2
      {
            "id": "33efc673-86ab-11e9-80bf-0cc47a868a49",
            "title": "WhatsApp: Adverts coming to messaging app next year, Facebook reveals"
      }

 GET /_search
    {
        "query": {
            "span_first" : {
                "match" : {
                    "span_term" : { "Title" : "a" }
                },
                "end" : 1
            }
        }
    }

Could you add a sample document to your script?

[
  {
    "id": "33efc673-86ab-11e9-80bf-0cc47a868a49",
    "title": "WhatsApp: Adverts coming to messaging app next year, Facebook reveals"
  },
  {
    "id": "f5b77d15-86a9-11e9-80bf-0cc47a868a49",
    "title": "New York City terror attack: what we know so far"
  },
  {
    "id": "de22b1fd-8383-11e9-80bf-0cc47a868a49",
    "title": "Articles"
  }
]

Is that one document? Could you please add it to your script so anyone can just copy and paste and run it from Kibana?

No, these are three documents.

Yes, I will add them to the above code.

Please read again the instructions I gave earlier. We should be able to just copy and paste the script in Kibana and run it. Here this won't work.

I updated it.

I updated it a bit. Not sure if this is what you are looking for though.

DELETE en
PUT en
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "simple"
      }
    }
  }
}

PUT en/_doc/1
{
  "title": "Articles"
}
PUT en/_doc/2
{
  "title": "Foo Articles"
}

GET /_search
{
  "query": {
    "prefix": {
      "title": "a"
    }
  }
}

I'm getting an error while executing the index creation script.
Failed to parse mapping [properties]: Root mapping definition has unsupported parameters: [title : {analyzer=simple, type=text}]
So, using a simple analyzer made the prefix search return only documents that start with an 'a' letter, in that case, it returned the doc with id 1 only, am I right?
The thing is I need to have 2 different analyzers as I stated earlier.

The thing is I need to have 2 different analyzers as I stated earlier.

Sure. You can have multiple subfields generated at index time with a different analyzer to support another use case.

See https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html

That's probably because you are not using 7.x.

Well, the prefix query returned both documents.
I need the document with id = 1 to be returned only and not both documents.

What I meant is that the position of words matter. And the matching needs to happen based on the first word of the title field.