I'm looking at using [index_phrases](https://www.elastic.co/guide/en/elasticsearch/reference/master/index-phrases.html)
instead of explicitly creating a subfield with shingles, but I'm a little confused on the behavior. Searching through the docs, I found a promising link titled "_faster_phrase_queries_with_literal_index_phrases_literal" but it 404s. Here's what I am trying to do:
#explore index_phrases behavior
DELETE en_docs
PUT en_docs
{
"mappings": {
"properties": {
"title": {
"type": "text",
"index_phrases": true
}
}
}
}
POST en_docs/_doc/a
{
"title": "James Charles wore the dress"
}
POST en_docs/_doc/b
{
"title": "Charles James made the dress"
}
# Both docs have same score.
GET en_docs/_search
{
"query": {
"query_string": {
"type": "most_fields",
"query": "Charles James"
}
}
}
# Both docs have same score
GET en_docs/_search
{
"query": {
"match": {
"title": "Charles James"
}
}
}
# Only doc b matches
GET en_docs/_search
{
"query": {
"query_string": {
"query": "\"Charles James\""
}
}
}
# Only doc b matches
GET en_docs/_search
{
"query": {
"match_phrase": {
"title": "Charles James"
}
}
}
I want to query on Charles James w/o quotes and have both docs returned, but with doc 'a' ranked higher. I was hoping that first query_string
query with most_fields
would do that for me, as that's what would happen if I had created a subfield with 2-word shingles.
Is the use case for index_phrases
just for when you want to run match_phrase
and wish it to run faster at the expense of a larger index?