Querying tokens at the same position

jeko · March 26, 2018, 5:07am

Hello,

I built a analyzer plugin that tokenizes XML. It generates 1 token per XML attribute, all tokens for a XML node are set at the same position.

I need a clean way to search for XML node that have multiple criteria set (example: <w a=1 b=2 c=3> ⇒ I want to find all nodes that have a=1 AND b=2.

I found a ugly solution by experimenting with the span API (setting slop to -1, see below). I wonder if I could find a better solution.

Now, in more details:

POST /_analyze

{"analyzer":"annotation", "text":"<w lemma=be>am</w>"}

Will output:

{
  "tokens": [
    {
      "token": "am",
      "start_offset": 0,
      "end_offset": 18,
      "type": "word",
      "position": 0
    },
    {
      "token": "lemma=be",
      "start_offset": 0,
      "end_offset": 18,
      "type": "attr",
      "position": 0
    }
  ]
}

I need a way to retrieve the document if it contains a node with the word am which have the lemma=be attribute.

Note: am and lemma=be are at the same position.

I couldn't find how to achieve this with the query language, but got something working with the span_near API, which is kinda hacky: a secret recipe was to set "slop" to -1 and "in_order" to false.

GET /corpus/segment/_search

{
    "query": {
        "span_near" : {
            "clauses" : [
                { "span_term" : { "sr": "am" } },
                { "span_term" : { "sr": "lemma=be" } }
            ],
            "slop" : -1,
            "in_order" : false
        }
    }
}

If anyone has experience and/or advices on how to achieve that more cleanly, it would be appreciated.

Thanks!
JC

system · April 23, 2018, 5:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query_string does not work with multiple tokens with the same position Elasticsearch	3	297	December 7, 2022
Span_Near query minimum slop Elasticsearch	1	791	July 5, 2017
Span Queries Near a Known Token Index Elasticsearch	4	491	April 27, 2017
Get Tokens Positions in a single Search Query ElasticSearch v7.3 Elasticsearch	1	404	November 11, 2019
Query on CLF log Elasticsearch	1	414	August 8, 2019

Querying tokens at the same position

Related topics