How to add "position_increment_gap" in a field created by a Crawler Index?

I have a index created by a crawler and it creates a Title field, but when I'm trying to do a match_phrase_prefix search I have the error "failed to create query: field:[title] was indexed without position data; cannot run PhraseQuery" I read that it needs to be added the "position_increment_gap", but when I tried to add it in the mappings, there's no way, gets me the error :

"type": "resource_already_exists_exception",
        "reason": "index [search-index1/BCnmIxvbQD-SUdKfU_IRpg] already exists".

My question is, how to add that attribute to the mapping if it's created by a crawler index?

I need to do a match_phrase_prefix to check exact matches on the title. It works for "match" only.

Hi @freddyrb. The default mapping of the title field in a crawler index should support match_phrase_prefix queries. Can you try running a simple query and see if it works?

GET search-test/_search
{
  "query": {
    "match_phrase_prefix": {
      "title": <your query>
    }
  },
  "_source": ["title"]
} 

The position_increment_gap seems to be only used for multi-valued fields .

If this doesn't help, can you share your search query so that we can help with troubleshooting?

Thanks @Jedr_Blaszyk , thanks for the answer. The query is:

GET search-testindex1/_search
{
  "query": {
    "match_phrase_prefix": {
      "title": "nuxeo drive"
    }
  },
  "_source": ["title"]
}

The problem is when I send more than one word, I want to search the documents that has in the title the "nuxeo drive" phrase, I don't want that Elastic search "nuxeo" and "drive" separated. Then I get the error.

The part of the query:

"query": {
    "bool": {
      "filter": {
        "terms": {
          "indexedContentType.keyword": [
            "Documents,TSKB Article,blog,forum"
          ]
        }
      },
      "must": {
        "bool": {
          "should": [
            {
              "match_phrase_prefix": {
                "title": {
                  "query": "Documentation Docs Home Getting Starte"
                }
              }
            },
            {
              "match_phrase_prefix": {
                "subject": {
                  "query": "Documentation Docs Home Getting Starte"
                }
              }
            },

Aha! Now, looking at the example you provided I understand the issue. Your default template mapping of search-testindex1 should already include a number of subfields for title property, that are indexed differently, to be suitable for a range of queries.

The stem subfield of title is indexed with position information. You can try running:

GET search-testindex1/_search
{
  "query": {
    "match_phrase_prefix": {
      "title.stem": "nuxeo drive"
    }
  },
  "_source": ["title"]
}

Let me know if it helps!

Also, you can always extend your existing mapping with a subfield or a new field. Here is an example of how to add a subfield called phrase to title property:

PUT search-testindex1/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "fields": {
        "phrase": {
          "type": "text",
          "index_options": "positions",
          "analyzer": "iq_text_base"
        }
      },
      "index_options": "freqs",
      "analyzer": "iq_text_base"
    }
  }
}

Thanks @Jedr_Blaszyk , I will see your suggestions, but my use case is a query with multiple indexes that have the field Title, and in all of them I have results (API Type) except in the Crawler index that throws the error.

Any suggestion?

my use case is a query with multiple indexes that have the field title

Ok, that complicates things a bit but I see 2 ways forward to support your use case.

Solution 1: index filter inside a bool query: more manageable, with slight query-time performance hit

Example query:

GET index1,index2,index3,crawler_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": {
              "multi_match": {
                "query": {query},
                "type": "phrase_prefix",
                "fields": ["title"]
              }
            },
            "must_not": {
              "terms": {
                "_index": ["crawler_index"]
              }
            }
          }
        },
        {
          "bool": {
            "must": {
              "multi_match": {
                "query": {query},
                "type": "phrase_prefix",
                "fields": ["title.stem"]
              }
            },
            "filter": {
              "terms": {
                "_index": ["crawler_index"]
              }
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

Solution 2: Update mappings in other indices with same subfield e.g. title.stem (indexed with position data) and reindex the data. A bit more work for you, but resulting slightly better query time performance. In that way you would be able to run query like:

GET index1,index2,index3,crawler_index/_search
{
  "query": {
    "match_phrase_prefix": {
      "title.stem": {query}
    }
  }
}

Hope this helps!

Thanks, I will check those options.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.