API key and document-level security: Query not limiting response

Hi!

I was reading up a bit on API keys and document level security. Here are the relevant guides that I read:

Now I created an API key like this:

POST /_security/api_key
{
  "name": "limited_api_key",
  "expiration": "182d", 
  "role_descriptors": { 
    "limited_access": {
      "cluster": ["all"],
      "index": [
        {
          "names": ["my-custom-index"],
          "privileges": ["read"],
          "query": {
            "match": {
              "project": {
                "query": "my_project"
              }
            }
          }
        }
      ]
    }
  }
}

I would have expected that this API key grants only read access to the index my-custom-index for documents whose project field is equal to my_project. However, I can also retrieve documents (with the elasticsearch_dsl python library) which have project fields other than my_project.

Things that I tried to rule out:

  • The API key is used. I can cross-check that by trying to access a different index (which fails as expected) or writing a document (which also fails as expected).
  • The query should be valid. I ran the query alone on the development console with GET... and retrieved only frames which had project == my_project.

Am I misunderstanding the query concept in the API key request?

What type of license are you using (if you are unsure then GET /_license).

We are currently on a free trial cloud via elastic.co. The license response that I get:

{
  "license" : {
    "status" : "active",
    "uid" : "<manually removed>",
    "type" : "platinum",
    "issue_date" : "<manually removed>",
    "issue_date_in_millis" : <manually removed>,
    "expiry_date" : "2022-06-30T00:00:00.000Z",
    "expiry_date_in_millis" : 1656547200000,
    "max_nodes" : 100000,
    "issued_to" : "Elastic Cloud",
    "issuer" : "API",
    "start_date_in_millis" : <manually removed>
  }
}

@TimV what minimum license does the document level security require? From the docs, I would have expected that even no subscription is required?

Hi @sgasse Couldn't help but poking in (interesting topic)...
and welcome to the community...

Document and Field level security is a platinum feature see here, but yes you should be OK to trial this.

So good news I just tested and for me it worked as expected in cloud with platinum license.

So I think we just need dig in a bit more.

Perhaps we should try a term query .... which can only match exact terms. its possible your match query is a full text query is is matching more than you think as from the Dev Tools it only returns the top 10 results.

  "query": {
    "term": {
      "project.keyword": {
        "value": "my_project"
      }
    }
  }

Here is my example
In my example I have many apps that are in a field cloudfoundry.app.name that field is a keyword if it was using the default mapping I would use cloudfoundry.app.name.keyword

POST /_security/api_key
{
  "name": "my-api-key",
  "expiration": "1d",
  "role_descriptors": {
    "role-a": {
      "cluster": [
        "all"
      ],
      "index": [
        {
          "names": [
            "filebeat-*"
          ],
          "privileges": [
            "read"
          ],
          "query": {
            "term": {
              "cloudfoundry.app.name": {
                "value": "spring-music"
              }
            }
          }
        }
      ]
    }
  }
}

Result

{
  "id" : "sadfasdfkBsdfsdfl0p6VzfPd",
  "name" : "my-api-key",
  "expiration" : 1620139846091,
  "api_key" : "asdadadsasdRde14awKNAgRZA"
}

converted to base64

echo -n "sadfasdfkBsdfsdfl0p6VzfPd:asdadadsasdRde14awKNAgRZA" | base64

then ran some queries

curl -H "Authorization: ApiKey sadfsdfsadfsadf2VnpmUGQ6Ukl6WTN6RmhSZGUxNGF3S05BZ1JaQQ==" -H "Content-Type: application/json" -d '{"size":3,"query":{"term":{"cloudfoundry.app.name":{"value":"spring-music"}}}}' https://myelasticurl.es.us-west1.gcp.cloud.es.io/filebeat-7.11.2-2021.04.28-000011/_search | jq

Worked as expected I got results for spring-music

Then I specifically tried to get results that we not matched the the role query i.e. search for other app specificallyy

sbrown$ curl -H "Authorization: ApiKey asdfsadfasdfasdfMHA2VnpmUGQ6Ukl6WTN6RmhSZGUxNGF3S05BZ1JaQQ==" -H "Content-Type: application/json" -d '{"size":3,"query":{"term":{"cloudfoundry.app.name":{"value":"scheduler-200ms"}}}}' https://myelasticurl.es.us-west1.gcp.cloud.es.io/filebeat-7.11.2-2021.04.28-000011/_search | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   242  100   161  100    81    540    271 --:--:-- --:--:-- --:--:--   812
{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}
ceres-2:filebeat-7.11.2-linux-x86_64 sbrown$ 

Then I tried a match_all and just got the results I would expect

ceres-2:filebeat-7.11.2-linux-x86_64 sbrown$ curl -H "Authorization: ApiKey sadfsadfsadfVnpmUGQ6Ukl6WTN6RmhSZGUxNGF3S05BZ1JaQQ==" -H "Content-Type: application/json" -d '{"size":3,"query":{"match_all":{}}}' https://myelasticurl.es.us-west1.gcp.cloud.es.io/filebeat-7.11.2-2021.04.28-000011/_search | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6256  100  6221  100    35  19440    109 --:--:-- --:--:-- --:--:-- 19489
{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "filebeat-7.11.2-2021.04.28-000011",
        "_type": "_doc",
        "_id": "MLbKGXkB_INU45ioej7I",
        "_score": 1,
        "_source": {
          "agent": {
            "hostname": "e8b823dc-4c45-4a00-7db7-95a4",
            "name": "e8b823dc-4c45-4a00-7db7-95a4",
            "id": "3e72667d-dbb0-46f1-a3cf-5460ee786dd9",
            "type": "filebeat",
            "ephemeral_id": "b3cf31bd-349c-42f6-b81c-b1ddaaab1cd9",
            "version": "7.11.2"
          },
          "message": "2021-04-28 18:42:49.910  INFO 20 --- [nio-8080-exec-1] o.c.samples.music.web.AlbumController    : Deleting album 7a41e2d3-689c-4f4f-a028-62ff5e30f10b",
          "input": {
            "type": "cloudfoundry"
          },
          "@timestamp": "2021-04-28T18:42:49.910Z",
          "ecs": {
            "version": "1.6.0"
          },
          "stream": "stdout",
          "host": {
            "name": "e8b823dc-4c45-4a00-7db7-95a4"
          },
          "cloudfoundry": {
            "app": {
              "name": "spring-music",
              "id": "ec3e9ff9-ff03-412e-98a9-40ed8d537241"
            },
...

Hope This Helps...

Hey @stephenb , thank you a lot for digging into this! I think you analysis sounds right - I might have just not noticed that my match query is fuzzy and thus returns more documents than it should.

Right now I unfortunately cannot confirm since our trial ran out and I did not have time to check it beforehand.

One small suggestion though: I stumbled upon the term query before in my search, since I know that Elasticsearch uses some fuzzy searching to enable finding hits even for pieces of text or small typos. However when I found the documentation, it said:

Warning
Avoid using the term query for text fields.

By default, Elasticsearch changes the values of text fields as part of analysis. This can make finding > exact matches for text field values difficult.

To search text field values, use the match query instead.

Especially the part about 'Elasticsearch changes the values of text fields as part of analysis' led me to believe that a term query is not the right thing for document level security. Though if I understand correctly now, what you really want to say is that:

  • Elasticsearch changes text fields as part of the analysis. This is a feature and enables finding more, better results.
  • Using a term query for text limits Elasticsearch to exact matches down to having bit-equal strings.
  • For most use-cases, querying text should thus be done with a match query.
  • For document-level security, we should use term queries for text specifically because it is not fuzzy.

Is that correct?

Hi @sgasse

Good question and often a little bit of an area of confusion.

Long story short text and keyword s are two fundamental different concepts in elasticsearch

Think as keywords as the the field is left as is and and a term query is an exact match query and is extremely fast using the concept of an inverted index.

text fields get analyzed and broken down into tokens stems etc and that's how elasticsearch supports all the great text searching capabilities.

So with that in mind a term query is exactly the right type for document level security but it should be applied to fields of data type keyword not text. Term queries are very fast use for exact matching and filtering.

The warning above is saying don't use a term query against a text field

What adds to the confusion a bit sometimes is that if you don't manually creating a mapping for a text field both a text field and keyword field are creating for you automatically.

Mappings and field types are very important, Perhaps perhaps do a little more reading here and here

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.