Match terms with characters in a sequence

I'm interested in building an autocomplete-style interface where someone can enter characters and the results will be the terms that have those characters in that order, but not necessarily contiguous. This is functionality that I've seen in code editors where, for instance, you can type in apmou and it will match "app/models/user.rb" and "app/models/user_account.rb", for instance. This might be too specialized a search, but I'm curious if there is a way to pull it off with ES.

@emarthinsen
I will try to work with code editor example you mentioned. Basic idea is

  1. Split path in the document on / and for each part create edge ngrams
  2. Since we need ordered match, we need to use phrase query.
  3. Create multiple phrases to match parts at different levels (apps vs models vs users).

Index mapping

{
  "settings": {
    "index": {
      "max_ngram_diff": 10,
      "analysis": {
        "tokenizer": {
          "path_tokenizer": {
            "type": "simple_pattern_split",
            "pattern": "/"
          }
        },
        "filter": {
          "1_10_edge_ngram": {
            "type": "edge_ngram",
            "min_gram": "1",
            "max_gram": "10"
          }
        },
        "analyzer": {
          "my_analyzer": {
            "filter": [
              "lowercase",
              "1_10_edge_ngram"
            ],
            "tokenizer": "path_tokenizer"
          },
          "my_search_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "path_tokenizer"
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "path_str": {
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "my_search_analyzer"
      }
    }
  }
}

Java code to generate query

    private static void gen_query(String searchStr) {
        int len = searchStr.length();
        StringBuilder query = new StringBuilder();
        query.append("{\"query\": {\"bool\": {\"should\": [");
        for (int i =0; i < Math.pow(2,len -1); i++) {
            StringBuilder pattern = new StringBuilder();
            for (int j = 0; j < len; j++) {
                pattern.append(searchStr.charAt(j));
                if (j < len -1) {
                    if ((i & (1 << (j))) != 0) {
                        pattern.append("/");
                    }
                }
            }
            query.append("{\"match_phrase\": {\"path_str\": \"" + pattern + "\"}},");
        }
        query.setCharAt(query.length() -1 , ' ');
        query.append("]}},\"sort\": [\"_doc\"]}");
        System.out.println(query);
    }

@Vinayak_Sapre Very interesting idea using a phrase query. I'm having a little trouble following your Java sample. You lost me a bit with your loop iterations. Are you trying to take the search term and break it down into every permutation of where the slashes could be?

@emarthinsen

Yes correct.

If you search for "apmou", slash can be at 0 to (length-1) places, where _ is in "a_p_m_o_u". At each location you have boolean choice, add or don't add slash. So you can represent all permutations as 0000, 1000, 0100, .... and there will be 2^(length-1) permutations.

The outer for loop goes over permutations. Inner constructs string, add character and check if you need slash after it.

if ((i & (1 << (j))) != 0) { checks if you want slash at j-th location in i-th permutation.

Hope this helps.

Btw I noticed I forgot to add lowercase filter. I will update my previous post.

@emarthinsen
Just curious, did the approach work for you? If not, can you post your findings. If it did, how is the performance. Query must run at typing speed.

Hi @Vinayak_Sapre. So sorry, I haven't had a chance to test it. I got pulled in another direction. Hoping to test it out soon though.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.