Search with custom analyzer returns no results

We have fix protocol log messages that look like "8=FIX.4.4|9=67|35=5|49=OMNIMKT001|56=GEMINIMKT|34=357821|52=20181109-23:00:59.800|10=056|". We would like to search for strings such as "35=5". To do this, I created a custom analyzer in my proof of concept

PUT /tokenizer-test-2018.11.14/
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": "\\|"
        }
      }
    }
  }
}

I populated the index with a couple of messages:

POST /tokenizer-test-2018.11.14/_doc
{
  "@timestamp": "2018-11-4T11:26:45 -0800",
  "message": "8=FIX.4.4|9=67|35=5|49=OMNIMKT001|56=GEMINIMKT|34=357821|52=20181109-23:00:59.800|10=056|"
}

POST /tokenizer-test-2018.11.14/_doc
{
  "@timestamp": "2018-11-4T11:26:45 -0800",
  "message": "8=FIX.4.4|9=67|35=0|49=OMNIMKT001|56=GEMINIMKT|34=357821|52=20181109-23:00:59.800|10=056|"
}

I used the analyze api to test the tokenization occurs as expected, where 35=5 is a token.

However when I search

GET /tokenizer-test-2018.11.14/_search
{
  "query": {
    "match": {
      "message": {
        "query": "35=5",
        "analyzer": "my_analyzer"
      }
    }
  }
}

I get 0 hits. When I remove the analyzer from the search, I get both the document with 35=0 (which we don't want) and 35=5, which we do want.

It looks like a list of key-value pairs. Why not parse the data into different fields at ingest, e.g. using the Logstash kv filter or an ingest pipeline kv processor? This would allow you to search as well as aggregate over the data.

That's on our roadmap. We just don't have the resources to do it right now.

Since I asked my question, I was able to get what I needed by creating a template before ingesting the data.

PUT /_template/tokenizer-test-template/
{
  "index_patterns": "tokenizer-test-*",
  "mappings": {
    "_doc": {
      "dynamic": "strict",
      "properties": {
        "@timestamp": {
          "type": "date"
        },
        "message": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "pattern",
          "pattern": "\\|"
        }
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.