Longest match in string/array


(Prarabdh Joshi) #1

I am planning to use ES for storing tracebacks of exceptions such as:
[a/b/c/d/func/line, e/f/g/h/func/line, i/j/k/l/func/line]

Now, for any incoming tracebacks, I want to rank the results based on which document matches call chain the most and also order matters (starting from reverse because doesn't matter if top half of call chain matches and rest doesn't)

So I have thought of something and I wanted to find out if there is something better I can use.

  1. Store tracebacks as string and use match_phrase query like:
    "a/b/c/d/func/line, e/f/g/h/func/line, i/j/k/l/func/line"
    If that doesn't return anything, look for:
    "e/f/g/h/func/line, i/j/k/l/func/line"
    and so on..
    This should make sure that docs with longest call chain match are at top.

    But this will be computationally expensive on ES.

I saw this thread suggesting to implement my own similarity model.

I am still a newbie to ES so I think that will take me a lot of time.

Is there anything out of the box that can help me?
Or may be reduce the computation on ES side by something like stop on first match?


(Adrien Grand) #2

I don't think a custom similarity would help. I think I would do it the following way:

At index-time, make sure to map trace as a keyword field and pre-process values to store every suffix. For instance a/b/c/func/line would be indexed as

{
  "trace": [ "line", "func/line", "c/func/line", "b/c/func/line", "a/b/c/func/line" ]
}

Then at search time do the same splitting. For instance if your query is "d/c/func/line"

GET _search
{
  "query": {
    "bool": {
      "should": [
        { "term": { "trace": "func/line" } },
        { "term": { "trace": "c/func/line" } },
        { "term": { "trace": "d/c/func/line" } }
      ]
    }
  }
}


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.