Field_masking_span - Not working as expected with span queries

Hi,

I am using field_masking_span to find spans across two different fields, a stemmed field (text) and an unstemmed field (text.unstemed).

In particular, I am trying to search for the word loss in the text field, and then trying to find economic and loss in text.unstemmed fields. Analyzing the word loss is also loss, so the first part of the query should not narrow anything, but it does.

Can the field_masking_span be used in the way I am using it below? Am I doing something incorrect here?

Below is the command that is NOT working. It returns no results against my search index.

    curl -s $ES_USER -XPOST $BASE/$INDEX/_search?pretty -H "Content-Type: application/json" -d '{
      "docvalue_fields": [
        "docket_exact",
        "date_filed"
      ],
      "from": 0,
      "query": {
        "bool": {
          "filter": [
            {
              "span_near": {
                "in_order": false,
                "slop": 9,
                "clauses": [{
                    "span_term": {
                      "text": "loss"
                    }
                  },{
                    "field_masking_span": {
                      "query": {
                        "span_near": {
                          "in_order": true,
                          "clauses": [{
    						  "span_term": {
    						    "text.unstemmed": "economic"
    						  }
    						},{
                              "span_term": {
                                "text.unstemmed": "loss"
                              }
                            }
                          ],
                          "slop": 0
                        }
                      },
                      "field": "text"
                    }
                  }
                ]
              }
            },
            {
              "range": {
                "date_filed": {
                  "gte": "2021-04-01T00:00:00"
                }
              }
            }
          ]
        }
      },
      "size": 1
    }'

However, if I remove the first loss clause in the outer span, then the query works. Note that this should make no difference because loss is also in the text.unstemed field.

    curl -s $ES_USER -XPOST $BASE/$INDEX/_search?pretty -H "Content-Type: application/json" -d '{
      "docvalue_fields": [
        "documenttitle_exact",
        "docket_exact",
        "court_exact",
        "date_filed"
      ],
      "from": 0,
      "query": {
        "bool": {
          "filter": [
            {
              "span_near": {
                "in_order": false,
                "clauses": [
                  {
                    "field_masking_span": {
                      "query": {
                        "span_near": {
                          "in_order": true,
                          "clauses": [
    						{
    						  "span_term": {
    						    "text.unstemmed": "economic"
    						  }
    						},
                            {
                              "span_term": {
                                "text.unstemmed": "loss"
                              }
                            }
                          ],
                          "slop": 0
                        }
                      },
                      "field": "text"
                    }
                  }
                ],
                "slop": 9
              }
            },
            {
              "range": {
                "date_filed": {
                  "gte": "2021-04-01T00:00:00"
                }
              }
            }
          ]
        }
      },
      "size": 1
    }' 2>/dev/null

I think I identified the issue. While indexing, I was indexing multiple values in the same field, which may have messed up term vectors. If anyone runs into this again, please be sure to double check how you indexed your content, and specifically for multi-value fields.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.