Error on query_suggestion endpoint (Elastic based AppSearch)

Hi everybody!

I am going to build a search solution for my companies' website using App Search. Since we use a 3rd party webcrawler, we decided to use App Search based on an Elasticsearch index, so we can use ingest pipelines while indexing our websites.

If i try to get query suggestions using

<ENTERPRISE_SEARCH>/api/as/v1/engines/<ENGINE>/query_suggestion
{
  "query": "Grou",
  "types":{
      "documents":{
      "fields":[
          "headlines",
          "title"
        ]
        }
    }
}

to get for example the suggestion "group", since i got multiple documents with this word in those fields, I get following reponse:

{
    "errors": [
        "Types documents fields contains invalid values: headlines and title"
    ]
}

Since using "types" in the request body is optional, i tried the same request with only the query in the body getting following reponse:

{
    "errors": [
        "Query cannot list suggestions for engine without string fields"
    ]
}

To me it doesn't make any sense, since the mapping for e.g. the "title" field is

"title": {
        "type": "text",
        "fields": {
          "delimiter": {
            "type": "text",
            "analyzer": "delimiter_analyzer"
          },
          "joined": {
            "type": "text",
            "analyzer": "joined_analyzer"
          },
          "prefix": {
            "type": "text",
            "analyzer": "prefix_analyzer"
          },
          "stem": {
            "type": "text",
            "analyzer": "german_stem_analyzer"
          }
        }
}

If i check the schema in the corresponding engine, the type of the field is also shown as "text" (though grayed out since it is elastic based)

I hope somebody can reproduce or help me, since it seems to be an error on a deeper level and i can't find a similar case.

Thanks!
David

P.S. It's my first forum post so please don't be too harsh on criticism :slight_smile:

Hi @adesso-david ,

How did you create the index, what mappings and settings did you use?

Hi @Irina_Truong,

i am using a index template for the mapping. The template looks like this:

"template": {
          "settings": {
            "index": {
              "analysis": {
                "filter": {
                  "delimiter_filter": {
                    "catenate_all": "true",
                    "type": "word_delimiter"
                  },
                  "space_remover": {
                    "pattern": """\s""",
                    "type": "pattern_replace",
                    "replacement": ""
                  },
                  "german_stemmer": {
                    "type": "stemmer",
                    "language": "light_german"
                  },
                  "prefix_filter": {
                    "type": "edge_ngram",
                    "min_gram": "3",
                    "max_gram": "15"
                  }
                },
                "analyzer": {
                  "prefix_analyzer": {
                    "filter": [
                      "prefix_filter"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                  },
                  "german_stem_analyzer": {
                    "filter": [
                      "lowercase",
                      "german_stemmer"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                  },
                  "joined_analyzer": {
                    "filter": [
                      "lowercase",
                      "asciifolding",
                      "space_remover"
                    ],
                    "type": "custom",
                    "tokenizer": "join_tokenizer"
                  },
                  "delimiter_analyzer": {
                    "filter": [
                      "delimiter_filter",
                      "lowercase",
                      "asciifolding"
                    ],
                    "type": "custom",
                    "tokenizer": "whitespace"
                  }
                },
                "tokenizer": {
                  "join_tokenizer": {
                    "pattern": """(?=(^|\s)(\w+\s\w+))""",
                    "type": "pattern",
                    "group": "2"
                  }
                }
              },
              "default_pipeline": "adesso-norconex-xml-pipeline"
            }
          },
          "mappings": {
            "_routing": {
              "required": false
            },
            "numeric_detection": false,
            "dynamic_date_formats": [
              "strict_date_optional_time",
              "yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z"
            ],
            "_source": {
              "excludes": [],
              "includes": [],
              "enabled": true
            },
            "dynamic": true,
            "dynamic_templates": [],
            "date_detection": true,
            "properties": {
              "teaserSubline_stored_only": {
                "type": "text",
                "fields": {
                  "delimiter": {
                    "analyzer": "delimiter_analyzer",
                    "type": "text"
                  },
                  "joined": {
                    "analyzer": "joined_analyzer",
                    "type": "text"
                  },
                  "prefix": {
                    "analyzer": "prefix_analyzer",
                    "type": "text"
                  },
                  "stem": {
                    "analyzer": "german_stem_analyzer",
                    "type": "text"
                  }
                }
              },
              "link": {
                "type": "keyword"
              },
              "teaserPicture474x474_stored_only": {
                "type": "keyword"
              },
              "colorSchemeCssClass_stored_only": {
                "type": "keyword"
              },
              "display_content": {
                "type": "text",
                "fields": {
                  "delimiter": {
                    "analyzer": "delimiter_analyzer",
                    "type": "text"
                  },
                  "joined": {
                    "analyzer": "joined_analyzer",
                    "type": "text"
                  },
                  "prefix": {
                    "analyzer": "prefix_analyzer",
                    "type": "text"
                  },
                  "stem": {
                    "analyzer": "german_stem_analyzer",
                    "type": "text"
                  }
                }
              },
              "title": {
                "type": "text",
                "fields": {
                  "delimiter": {
                    "analyzer": "delimiter_analyzer",
                    "type": "text"
                  },
                  "joined": {
                    "analyzer": "joined_analyzer",
                    "type": "text"
                  },
                  "prefix": {
                    "analyzer": "prefix_analyzer",
                    "type": "text"
                  },
                  "stem": {
                    "analyzer": "german_stem_analyzer",
                    "type": "text"
                  }
                }
              },
              "content": {
                "type": "text",
                "fields": {
                  "delimiter": {
                    "analyzer": "delimiter_analyzer",
                    "type": "text"
                  },
                  "joined": {
                    "analyzer": "joined_analyzer",
                    "type": "text"
                  },
                  "prefix": {
                    "analyzer": "prefix_analyzer",
                    "type": "text"
                  },
                  "stem": {
                    "analyzer": "german_stem_analyzer",
                    "type": "text"
                  }
                }
              },
              "source_url": {
                "type": "keyword"
              },
              "content_type_multi_keyword": {
                "type": "keyword"
              },
              "@timestamp": {
                "type": "date"
              },
              "teaserPicture948x948_stored_only": {
                "type": "keyword"
              },
              "site_multi_keyword": {
                "type": "keyword"
              },
              "teaserPicture750x500_stored_only": {
                "type": "keyword"
              },
              "headlines": {
                "type": "text",
                "fields": {
                  "delimiter": {
                    "analyzer": "delimiter_analyzer",
                    "type": "text"
                  },
                  "joined": {
                    "analyzer": "joined_analyzer",
                    "type": "text"
                  },
                  "prefix": {
                    "analyzer": "prefix_analyzer",
                    "type": "text"
                  },
                  "stem": {
                    "analyzer": "german_stem_analyzer",
                    "type": "text"
                  }
                }
              },
              "branche_id": {
                "type": "keyword"
              },
              "topic": {
                "type": "keyword"
              },
              "mime_type_multi_keyword": {
                "type": "keyword"
              },
              "id": {
                "type": "keyword"
              },
              "teaserPicture1500x1000_stored_only": {
                "type": "keyword"
              },
              "language_multi_keyword": {
                "type": "keyword"
              },
              "branche_name": {
                "type": "keyword"
              },
              "crawldate": {
                "type": "date"
              }
            }
          }
        }

The processors of the pipeline are the following, in case that is the reason for the error:

[
  {
    "script": {
      "source": "for (int i=0; i<ctx.keys.length; i++){\n    ctx[ctx.keys[i]] = ctx.values[i]\n}"
    }
  },
  {
    "split": {
      "field": "headlines",
      "separator": "\\n"
    }
  },
  {
    "date": {
      "field": "date_l",
      "formats": [
        "UNIX_MS"
      ]
    }
  },
  {
    "remove": {
      "field": [
        "values",
        "keys",
        "date_l",
        "date_date"
      ],
      "ignore_missing": true
    }
  },
  {
    "pipeline": {
      "name": "fix-adesso-link"
    }
  },
  {
    "set": {
      "field": "id",
      "copy_from": "_id"
    }
  }
]

Since the documents are coming in with two fields

keys= [key1, key2, key3]
values= [value1, value2, value3]

i am using a script processor to create proper fields for the document.

Thank you @adesso-david , this was very helpful. I was able to reproduce the problem.

We claim in the documentation that suggestions should work on text fields, but as things stand right now, this is not the full story. The suggestions only work if the text field also has a subfield of type keyword.

I added this subfield to your title and headlines mappings, and that enabled the suggestions:

"enum": {
  "type": "keyword"
},

I'm going to file an issue in Enterprise Search to figure out if we need to update our documentation, or fix the implementation.

For now, you can use the workaround of the keyword subfield.

Thank you for reporting the problem.

1 Like

Thank you very much @Irina_Truong!
The suggestions generation works properly now.