Longest match in string/array

prarabdh9909 · August 23, 2017, 7:09pm

I am planning to use ES for storing tracebacks of exceptions such as:
[a/b/c/d/func/line, e/f/g/h/func/line, i/j/k/l/func/line]

Now, for any incoming tracebacks, I want to rank the results based on which document matches call chain the most and also order matters (starting from reverse because doesn't matter if top half of call chain matches and rest doesn't)

So I have thought of something and I wanted to find out if there is something better I can use.

Store tracebacks as string and use match_phrase query like:
"a/b/c/d/func/line, e/f/g/h/func/line, i/j/k/l/func/line"
If that doesn't return anything, look for:
"e/f/g/h/func/line, i/j/k/l/func/line"
and so on..
This should make sure that docs with longest call chain match are at top.

But this will be computationally expensive on ES.

I saw this thread suggesting to implement my own similarity model.

I am still a newbie to ES so I think that will take me a lot of time.

Is there anything out of the box that can help me?
Or may be reduce the computation on ES side by something like stop on first match?

jpountz · August 23, 2017, 9:22pm

I don't think a custom similarity would help. I think I would do it the following way:

At index-time, make sure to map trace as a keyword field and pre-process values to store every suffix. For instance a/b/c/func/line would be indexed as

{
  "trace": [ "line", "func/line", "c/func/line", "b/c/func/line", "a/b/c/func/line" ]
}

Then at search time do the same splitting. For instance if your query is "d/c/func/line"

GET _search
{
  "query": {
    "bool": {
      "should": [
        { "term": { "trace": "func/line" } },
        { "term": { "trace": "c/func/line" } },
        { "term": { "trace": "d/c/func/line" } }
      ]
    }
  }
}

system · September 20, 2017, 9:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Similarity score in array Elasticsearch	2	1726	July 6, 2017
Scoring longest continuous string in documents Elasticsearch	1	1066	July 6, 2017
Elasticsearch - How to find similarity between 2 arrays? Elasticsearch	5	941	April 5, 2019
Exact matches on arrays Elasticsearch	3	1106	July 6, 2017
Matching arrays of single character strings Elasticsearch	2	494	July 6, 2017

Longest match in string/array

Related topics