Question on `index_phrases` behavior

I'm looking at using [index_phrases](https://www.elastic.co/guide/en/elasticsearch/reference/master/index-phrases.html) instead of explicitly creating a subfield with shingles, but I'm a little confused on the behavior. Searching through the docs, I found a promising link titled "_faster_phrase_queries_with_literal_index_phrases_literal" but it 404s. Here's what I am trying to do:

#explore index_phrases behavior
DELETE en_docs
PUT en_docs
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "index_phrases": true
      }
    }
  }
}

POST en_docs/_doc/a
{
    "title": "James Charles wore the dress"
}
POST en_docs/_doc/b
{
    "title": "Charles James made the dress"
}
# Both docs have same score.
GET en_docs/_search
{
  "query": {
    "query_string": {
      "type": "most_fields",
      "query": "Charles James"
    }
  }
}
# Both docs have same score
GET en_docs/_search
{
  "query": {
    "match": {
      "title": "Charles James"
    }
  }
}

# Only doc b matches
GET en_docs/_search
{
  "query": {
    "query_string": {
      "query": "\"Charles James\""
    }
  }
}
# Only doc b matches
GET en_docs/_search
{
  "query": {
    "match_phrase": {
      "title": "Charles James"
    }
  }
}

I want to query on Charles James w/o quotes and have both docs returned, but with doc 'a' ranked higher. I was hoping that first query_string query with most_fields would do that for me, as that's what would happen if I had created a subfield with 2-word shingles.

Is the use case for index_phrases just for when you want to run match_phrase and wish it to run faster at the expense of a larger index?

Hey Loren!

maybe I'm misreading things, but wouldnt a bool query with a should clause that contains a match_phrase and a must that contains the match query be what you are after?

The match_phrase query would internally go for the index phrases field, even if you specify title as the fieldname. See https://github.com/elastic/elasticsearch/blob/7.0/server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java#L649-L652

--Alex

Thank you Alex!

Yes, I can do something like that. I think I was sort of expecting my search term to get analyzed into bigrams and then take advantage of that hidden index phrases field as part of the query_string query. That's what I'd get with an explicit bigram subfield, but it's a different use case really: searching on an explicit phrase versus boosting results that share arbitrary bigrams with the query term.

Using the lovely search profiler in Kibana I was able to see that a query like

{
  "query": {
    "query_string": {
      "query": "\"Charles James\" dress"
    }
  }
}

does make use of title._index_phrase under the hood via a TermQuery, so that will suit my needs after all.

Thanks again for your help.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.