Setting up index for maximum search ability (with front end typeahead)

YeeP · August 29, 2017, 2:54pm

Most of my experience with elasticsearch has been for storing timebased data that comes from logs we track on the network. We are trying to expand it for use as a database for a web based application being written. I am just looking for any tips on what I can be looking at to try and learn how to do this a little better. Here is the current issue I am working on:

Data being indexed (coming from a different application)
example:

sites = [
{
            sys_id: "13c04e140f92b500d55ae498b1050e8a",
            name: "Receivables Performance Management, LLC"
        },
        {
            sys_id: "13c04e140f92b500d55ae498b1050e8b",
            name: "ROCHESTER NY"
        },
        {
            sys_id: "13c04e140f92b500d55ae498b1050e8c",
            name: "ROSELAND NJ"
        },
        {
            sys_id: "17c04e140f92b500d55ae498b1050e8a",
            name: "LAYTON UT"
        }
]

In an effort to make the data sortable, and searchable, I have chosen to set this up as a type of "nested" with a multi-field purely to use the keyword for sorting. Here is the current mapping and settings:

PUT /sev_sites
{
  "settings": {
    "analysis":{
      "analyzer": { 
      "site_analyzer":{
        "type": "custom",
        "tokenizer": "site_tokenizer",
        "filter": "lowercase"
      }
      },
      "tokenizer": {
        "site_tokenizer":{
          "type": "nGram",
          "min_gram": 3,
          "max_gram": 20
        }
        
      }
    }
  },
  "mappings": {
    "object": {
      "properties": {
        "site": {
          "type": "nested",
          "properties": {
            "sys_id": {
              "type": "text"
            },
            "name": {
              "type": "text",
              "analyzer": "site_analyzer",
              "fields":{
                "raw": {
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

So now for the problem; I am trying to set up a query to return partial matches. Can someone explain the "proper" way to setup this index and which way is best to return partial matches on the given data set? I setup the analyzer using the Elasticsearch documentation, but a simple "match" query still only returns exactly what it matches (duh!) . I have been playing with the fuzzy and regex queries, which seem to be working well. If anyone could give me any suggestions on if I am setting this up in a logical way and/or what would work better I would really appreciate it.

Also, I am still trying to get the sorting to work correctly, if anyone could help me on that, to sort by the "raw" keyword I would definitely appreciate it.

Current testing of the fuzzy search are working "ok"...

GET /sev_sites/_search
{
    "query": {
        "fuzzy" : {
            "name" : {
                "value" :         "roseland",
                    "boost" :         1.0,
                    "fuzziness" :     2,
                    "prefix_length" : 2,
                    "max_expansions": 100
            }
        }
    }
}

Returns:

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 3.6795852,
    "hits": [
      {
        "_index": "sev_sites",
        "_type": "object",
        "_id": "AV4qoQ8DsabpF_YTeXzo",
        "_score": 3.6795852,
        "_source": {
          "sys_id": "dec04e140f92b500d55ae498b1050e2c",
          "name": "ROCKLAND"
        }
      },
      {
        "_index": "sev_sites",
        "_type": "object",
        "_id": "AV4qoQXDsabpF_YTeXvT",
        "_score": 3.5934165,
        "_source": {
          "sys_id": "13c04e140f92b500d55ae498b1050e8c",
          "name": "ROSELAND NJ"
        }
      }
    ]
  }
}

However, this query does not return what I had hoped:

GET /sev_sites/_search
{
    "query": {
        "fuzzy" : {
            "name" : {
                "value" :         "rose",
                    "boost" :         1.0,
                    "fuzziness" :     2,
                    "prefix_length" : 2,
                    "max_expansions": 100
            }
        }
    }
}

Returns:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.4354112,
    "hits": [
      {
        "_index": "sev_sites",
        "_type": "object",
        "_id": "AV4qoQ77sabpF_YTeXzn",
        "_score": 1.4354112,
        "_source": {
          "sys_id": "dec04e140f92b500d55ae498b1050e2b",
          "name": "ROCK TAVERN NY"
        }
      }
    ]
  }
}

Ivan · August 30, 2017, 10:55am

The issue might be the combination of using a fuzzy query with ngram
tokens. Have you tried to either use a simply match query with the ngram'd
field or applying an analyzer that does not apply ngrams? Easy enough to
change the query or add another multifield to test if reindex is not an
issue.

system · September 27, 2017, 10:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Analyzer problem Elasticsearch	3	362	July 6, 2017
What search to use? Elasticsearch	7	783	July 5, 2017
Search issues - beginner user in need of guidance :) Elasticsearch	2	340	October 24, 2019
What is the best way to query first and last name? Elasticsearch	3	6179	July 14, 2018
Advice about mapping Elasticsearch	3	345	July 6, 2017

Setting up index for maximum search ability (with front end typeahead)

Related topics