Field names starting with `_`(underscore) are not matched with * wildcard

Hi,

* wildcard does not match fields starting with _ in the query_string query.

Example:

Create Index with simple mapping:

{
    "mappings": {
        "properties": {
            "_a": {
                "type": "keyword"
            }
        }
    }
}

When query:

{
   "query": {
      "query_string": {
         "query": "value",
         "fields": [
            "*"
         ]
      }
   }
}

It does not match the document. However this query works:

{
   "query": {
      "query_string": {
         "query": "value",
         "fields": [
            "_*"
         ]
      }
   }
}

Is this behavior expected ?
I didn't find any documentation to avoid field names starting with _.

Thanks,
Ravi Teja Meka

Hi, As per documentation for default_field in query_string query (Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html):

Defaults to the index.query.default_field index setting, which has a default value of *. The * value extracts all fields that are eligible to term queries and filters the metadata fields. All extracted fields are then combined to build a query if no prefix is specified.

And by this:

https://github.com/elastic/elasticsearch/blob/e0b3ea041671e7600e8a1b76491f91041940a386/server/src/main/java/org/elasticsearch/index/search/QueryParserHelper.java#L148

Does this mean that fields starting with _ are considered as metadata fields ?

Should Users not index fields starting with _ ?

We are using indeed _ prefix for metadata fields like _index, _id, ...

So I'd not use this prefix.

1 Like

Thanks @dadoonet .
But for our use case, it seems it is unavoidable to index field names starting with underscore.

As alternate approach, we are thinking to wrap all the user defined fields in a top-level object field. Like:

PUT sample_index
{
    "mappings": {
        "properties": {
            "user_defined": {
                "type": "object",
                "properties": {
                    "_a": {
                        "type": "date"
                    },
                    "_b": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

I see that user_defined.* is matching _ prefix fields also.

{
   "query": {
      "query_string": {
         "query": "value",
         "lenient": true,
         "fields": [
            "user_defined.*"
         ]
      }
   }
}

Do you think we can go ahead with this approach?
Do you see any query/index related limitations ?

Why this?

Another approach could be using the ingest rename processor to rename at index time your field names if there is no solution to control this from your application...

That could work indeed.

Thanks !
In our application, we allow users to define any field name starting with an alphabet. And due to application constraints we can only use _ prefix fields as application's generated/calculated fields.
And these fields should participate in full text search.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.