"cannot poll for user changes since security index [.security] does not exist" prevents authorization

New to Elasticsearch & Shield. Required to work with version 2.4.6, and cannot currently upgrade to a newer version.

Installed Shield, started a SINGLE NODE Elasticsearch 'cluster' with basic authentication,
added a user (via bin/shield/esusers) with 'admin' privileges and attempted to execute a number of cluster based queries.

The queries produce errors along with the message "cannot poll for user changes since security index [.security] does not exist", in the log output. I believe these issues should not be ignored. I say this because I believe those errors and the missing .security index condition is preventing authorization from occurring when role based PKI authentication (https://www.elastic.co/guide/en/shield/current/pki-realm.html) is employed rather than basic auth.

The basic auth details are shown below (configuration changes, command lines executed, & log output). I'm hoping someone can help identify why the .security index is not created and how to correct the situation.

Elasticsearch version 2.4.6, Shield version: 2.4.6

./elasticsearch-2.4.6/config/elasticsearch.yml
action.auto_create_index: ".security,.security*,.monitoring*,.waches,.triggered_watches,.watcher-history*,.ml*"

./elasticsearch-2.4.6/config/logging.yml
shield.authc: TRACE
shield.authz: TRACE
shield.transport.tracer: TRACE

./elasticsearch-2.4.6/config/shield/logging.yml
logger:
shield.audit.logfile: TRACE, access_log
additivity:
shield.audit.logfile: true

Start up Elasticsearch with Shield

/scratch/es/elasticsearch/bin/elasticsearch -Dnetwork.host=myHost.xx.xxxx.com --cluster.name kv-es-cluster --node.name myHost

....
[2019-02-19 07:51:48,396][INFO ][node ] [myHost] initialized
[2019-02-19 07:51:48,399][INFO ][node ] [myHost] starting ...
[2019-02-19 07:51:48,721][TRACE][shield.authz.store ] [myHost] attempting to read roles file located at [/scratch/es/elasticsearch/config/shield/roles.yml]
[2019-02-19 07:51:48,761][TRACE][shield.authc.esusers ] [myHost] reading users file [/scratch/es/elasticsearch/config/shield/users]...
[2019-02-19 07:51:48,761][WARN ][shield.authc.esusers ] [myHost] no users found in users file [/scratch/es/elasticsearch/config/shield/users]. use bin/shield/esusers to add users and role mappings
[2019-02-19 07:51:48,762][DEBUG][shield.authc.esusers ] [myHost] realm [esusers] has no users
[2019-02-19 07:51:48,764][TRACE][shield.authc.esusers ] [myHost] reading users_roles file [/scratch/es/elasticsearch/config/shield/users_roles]...
[2019-02-19 07:51:48,765][WARN ][shield.authc.esusers ] [myHost] no entries found in users_roles file [/scratch/es/elasticsearch/config/shield/users_roles]. use bin/shield/esusers to add users and role mappings
....
[2019-02-19 07:51:51,956][DEBUG][shield.authc.esnative ] [myHost] native users store waiting until gateway has recovered from disk
[2019-02-19 07:51:51,957][DEBUG][shield.authz.store ] [myHost] native roles store waiting until gateway has recovered from disk
....
[2019-02-19 07:51:52,034][DEBUG][shield.authc.esnative ] [myHost] security index [.security] does not exist, so service can start
[2019-02-19 07:51:52,034][DEBUG][shield.authz.store ] [myHost] security index [.security] does not exist, so service can start
[2019-02-19 07:51:52,037][TRACE][shield.authc.esnative ] [myHost] cannot poll for user changes since security index [.security] does not exist
....
after a minute or two, the log output repeatedly displays
....
[2019-02-19 07:52:22,041][TRACE][shield.authc.esnative ] [myHost] cannot poll for user changes since security index [.security] does not exist
[2019-02-19 07:52:22,043][TRACE][shield.authz.store ] [myHost] cannot poll for role changes since security index [.security] does not exist
....
Add root user

elasticsearch/bin/shield/esusers useradd root -r admin -p myPasswd

elasticsearch/bin/shield/esusers list
root : admin

....
[2019-02-19 07:54:43,847][INFO ][shield.authc.esusers ] [myHost] users file [/scratch/es/elasticsearch/config/shield/users] changed. updating users... )
[2019-02-19 07:54:43,848][TRACE][shield.authc.esusers ] [myHost] reading users file [/scratch/es/elasticsearch/config/shield/users]...
[2019-02-19 07:54:43,849][TRACE][shield.authc.esusers ] [myHost] invalidating cache for all users in realm [default_file]
[2019-02-19 07:54:43,849][INFO ][shield.authc.esusers ] [myHost] users_roles file [/scratch/es/elasticsearch/config/shield/users_roles] changed. updating users roles...
[2019-02-19 07:54:43,849][TRACE][shield.authc.esusers ] [myHost] reading users_roles file [/scratch/es/elasticsearch/config/shield/users_roles]...
[2019-02-19 07:54:43,850][TRACE][shield.authc.esusers ] [myHost] invalidating cache for all users in realm [default_file]
....

Query the cluster

curl -u root:myPasswd -X GET 'http://myHost:9200/_cat'

Although the query produces expected output, the log shows a stack trace, along with the message indicating 'could not retrieve user[root] because of non-existent .security index'

....
[2019-02-19 07:57:15,509][DEBUG][shield.authc.esnative ] [myHost] user not found in cache, proceeding with normal authentication
[2019-02-19 07:57:15,516][TRACE][shield.authc.esnative ] [myHost] could not retrieve user [root] because security index does not exist

[.security] IndexNotFoundException[no such index]
at org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:151)
....
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

[2019-02-19 07:57:15,531][DEBUG][shield.authc.esusers ] [myHost] user not found in cache, proceeding with normal authentication
[2019-02-19 07:57:15,631][DEBUG][shield.authc.esusers ] [myHost] authenticated user [root], with roles [[admin]]
....
[2019-02-19 07:57:22,045][TRACE][shield.authc.esnative ] [myHost] cannot poll for user changes since security index [.security] does not exist
[2019-02-19 07:57:22,046][TRACE][shield.authz.store ] [myHost] cannot poll for role changes since security index [.security] does not exist
....

Other queries produce similar results (correct output but stack trace in the log because the .security index does not exist).

Any thoughts on what I'm doing wrong in the configuration? Or why the .security index
is not created and how to fix this issue?

Thanks,
Brian

This is a very old version.. Can you please format your post (Use the preview panel to see it produces the correct output) as it's really hard to even go through this now ?

Ioannis Kakavas wrote:

This is a very old version..

Sorry, I'm required to use that version.

For what it's worth, google search results seem to indicate that this issue (or something similar) has affected others who are using later versions as well. Unfortunately, none of those posting seem to provide a definite answer.

Can you please format your post (Use the preview panel to see it produces the correct output) as it's really hard to even go through this now ?

I apologize. It's the first time using your tool. I've edited the original message. I removed the auto-bolding the tool added and tried to clean up the stack trace a bit.

Please let me know if the changes are more readable now, or if I should make more changes.

Thanks,
Brian

The issue is that the .security index is not created, as seen in the logs. Did you go through the documentation in https://www.elastic.co/guide/en/shield/current/getting-started.html ?

Specifically, did you set action.auto_create_index: .security as described in step 3 ?

Authentication with root:myPasswd works as this is performed against the file realm which is node specific and doesn't need a .security index.

Ioannis Kakavas wrote:

The issue is that the .security index is not created, as seen in the logs. Did you go through the documentation in https://www.elastic.co/guide/en/shield/current/getting-started.html ?

Yes, multiple times.

As I indicated in my original message, I added the 'action.auto_create_index' entry to the elastic search.yml file. I've tried specifying both '.security' for that property (as you suggest) and the value ".security,.security*,.monitoring*,.waches,.triggered_watches,.watcher-history*,.ml*"; that is,

action.auto_create_index: ".security,.security*,.monitoring*,.waches,.triggered_watches,.watcher-history*,.ml*"

Authentication with root:myPasswd works as this is performed against the file realm which is node specific and doesn't need a .security index.

Understood. But no matter what value I specify for action.auto_create_index in elasticsearch.yml, the .security index is never created.

Do you have any thoughts on why that might be?

For what it's worth, the elasticsearch.yml config file is in the default directory (/elasticsearch-2.4.6/config/elasticsearch.yml). So I believe that the value of the action.auto_create_index property is being picked up by the Elasticsearch server when it is started. It seems that the value of that property may be ignored by Elasticsearch though. Is that possible?

Can you suggest a logger that can be set to TRACE or DEBUG that might show whether or not the elasticsearch.yml config file is being read, and whether or not the action.auto_create_index property has been set?

Thanks,
Brian

Hi @btmurphy,

Could you please check if the security-index-template has been created and present on your setup?
curl -XGET <ES>:<PORT>/_template/security-index-template
If the template does not exist, there might have been some issue adding the template and in turn creation of the .security index.

Thanks and Regards,
Yogesh Gaikwad

Yogesh_Gaikwad wrote:

Could you please check if the security-index-template has been created and present on your setup?
If the template does not exist, there might have been some issue adding the template and in turn creation of the .security index.

Sorry for the late reply. I didn't see this until just now.

Yes, right after my last posting yesterday, I did verify that the security-index-template does indeed exist. Additionally, if I manually create the .security index after starting up ES, then both the stack traces and the error messages reported yesterday go away.

This tells me that either I've got some sort of format/syntax error in the 'action.auto_create_index: .security' property setting in elasticsearch.yml, or Elasticsearch is not reading the value of that property, or Elasticsearch is reading that property but simply not creating the .security index at startup as the ES documentation seems to indicate it should do.

So I guess a question that would be good for the Elasticsearch team to answer is whether the .security index is supposed to be automatically created when an ES cluster is initially deployed and the value of the 'action.auto_create_index' property is set to either '.security' or to true?

If possible, could someone from the ES team verify whether the above interpretation of the ES doc (step 3 of https://www.elastic.co/huide/en/shield/current/getting-started.html) is correct or not?

Although manually creating the .security index might be a reasonable work around for the initial problem reported in the first posting to this thread (where Basic Authentication is being used), I'm wondering if the same problem as that reported yesterday still exists when I move from Basic Authentication to a PKI role based authentication/authorization model. I ask this because there seems to be a 'chicken-and-egg' dilemma.

Although I can manually create the .security index when only Basic Authentication is configured, I cannot create that index if I instead configure ES with PKI certs and keys. This is because if ES is configured for PKI, then in order to manually create the .security index one needs that exact same .security index to already exist before that index can be created. This is because authorization mechanism queries the .security index to determine whether the user requesting the creation of the .security index has the appropriate privileges granted to the user making the request.

I'm going to investigate further to see whether I can first deploy a new ES cluster initially configured for only Basic Authentication, then add a root user with admin privileges and through that user, create the .security index. Once that is done, I'll shutdown the ES cluster, modify the configuration to employ PKI for authentication/authorization, and finally restart the re-configured ES cluster with PKI.

If that works, then I'll try to add a new user with corresponding public cert and private key (with CN=username in the DN) to see if that new user is properly authorized when it attempts to perform various queries for which it has been granted privileges via roles.yml and role_mapping.yml.

But it seems like ES should actually be creating the .security index automatically, unless I'm misunderstanding something. It seems like having to do a work around like that described above shouldn't be necessary.

Thoughts?

Can you try to set
action.auto_create_index: true or action.auto_create_index: .security instead of the quoted list of index patterns you had originally and see if that helps?

I couldn't find any issue that affects this version from back in the days. Ping @jaymode in case he remembers something relevant.

Ioannis Kakavas wrote:

Can you try to set
action.auto_create_index: true or action.auto_create_index: .security instead of the quoted list of index patterns you had originally and see if that helps?

Yes, I tried both .security and true and each produces the same results as the quoted list of index patterns.

That setting does not automatically create the index, but in fact allows the index to be created automatically by an index operation into the index if it does not exist. So if you create a user or role using the REST api, then that action would create the index using the built in index auto creation feature of Elasticsearch.

Actually, in this version of Shield these messages are logged at DEBUG/TRACE levels since we expect to be able to operate without the .security index.

Why do you believe this? What errors do you get with PKI auth? How are you mapping your PKI user to a role? Can you share logs from an attempted authentication with PKI?

Brian Murphy previously wrote:

I'm going to investigate further to see whether I can first deploy a new ES cluster initially configured for only Basic Authentication, then add a root user with admin privileges and through that user, create the .security index. Once that is done, I'll shutdown the ES cluster, modify the configuration to employ PKI for authentication/authorization, and finally restart the re-configured ES cluster with PKI.

If that works, then I'll try to add a new user with corresponding public cert and private key (with CN=username in the DN) to see if that new user is properly authorized when it attempts to perform various queries for which it has been granted privileges via roles.yml and role_mapping.yml.

As I said in previous posts to this thread, my primary goal is to be able to run a secure ES cluster using the PKI realm. The use of Basic Authentication presented above was merely an attempt to isolate the cause for PKI authorization failure; which turned out to be related to the fact that the .security index was not being automatically created.

Although the issue with the non-existent .security index is still a concern, and hopefully will eventually be solved, in order to make progress toward the PKI goal, I'm now doing what's described in the quote above. That is,

  1. ES with Basic Authentication is deployed
  2. A 'root' user is created with admin privileges
  3. The .security index is created by the 'root' user
  4. ES is shutdown
  5. Java keytool is used to generate a public trustore (elasticsearch.trust) and a private keystore (elasticsearch.keys); in which the user's DN is, CN=myUser, OU=example.org.unit, O=example.org, L=example.city, ST=example.state, C=US
  6. OpenSSL is used to retrieve from the keystore, the public certificate file and the private key file, in pem format (elasticsearch.pem and elasticsearch.pkey respectively).
  7. The elasticsearch.yml is modified to include:

action.auto_create_index: .security
shield.transport.ssl: true
shield.ssl.keystore.path: /config/shield/elasticsearch.keys
shield.ssl.keystore.password: myPswd
shield.ssl.keystore.key_password: myPswd
shield.ssl.truststore.path: /config/shield/elasticsearch.trust
shield.authc.realms:
pki1:
type: pki
order: 0
files:
role_mapping: /config/shield/role_mapping.yml
file1:
type: file
order: 1

  1. The role_mapping.yml file is modified to include:

admin: "CN=myUser, OU=example.org.unit, O=example.org, L=example.city, ST=example.state, C=US"

  1. The roles.yml file is the default roles.yml. That is, the admin role is granted all cluster rights, and all operational rights on all indices:

admin:
cluster:
- all
indices:
- names: '*'
privileges:
- all

Note: in each config file snippet above, the appropriate indentations were included, but the block quote widget of the reply tool seems to remove all such indentations.

  1. After modifying the configs and generating the PKI security credentials described above, ES is restarted.
  2. The esusers utility is then used to add a new user with the same name (myUser) as the CommonName in DN of the certs above. And that new user is granted the role of admin; that is,

bin/shield/esusers/useradd myUser -r admin -p myPswd

  1. Using the certs associated with new user named myUser, a query requiring admin privileges is executed, and fails:

curl -k -E /conf/shield/elasticsearch.pem --key /config/shield/elasticsearch.pkey -X GET 'https://myHost:9200/_cat/indices/.security'

which produces the following error message on the command line:

{"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:monitor/state] is unauthorized for user [myUser]"}],"type":"security_exception","reason":"action [cluster:monitor/state] is unauthorized for user [myUser]"},"status":403}

and produces the following log output in the debug logs:

[2019-02-20 10:37:13,100][DEBUG][shield.authc.support ] [myHost] the roles [], are mapped from these [pki] groups [] for realm [pki/pki1]
[2019-02-20 10:37:13,109][DEBUG][shield.authc.support ] [myHost] the roles [], are mapped from the user [pki] for realm [CN=myUser, OU=example.org.unit, O=example.org, L=example.city, ST=example.state, C=US/pki]

The item in the log output above that caught my eye is the empty roles in the second log record. It shows the correct DN, but I would have expected the roles [] to be roles[[admin]] rather than empty. Prior to the .security index issue originally discussed, I thought the roles were empty because they couldn't be looked up due to the fact that the .security index wasn't being created. But now that the .security index was manually created and thus exists, the roles are still empty. So I'm not sure why myUser is not being mapped to the admin role.

Jay Modi wrote:

That setting (action.auto_create_index: .security) does not automatically create the index, but in fact allows the index to be created automatically by an index operation into the index if it does not exist. So if you create a user or role using the REST api, then that action would create the index using the built in index auto creation feature of Elasticsearch.

Okay, thanks for the explanation. I really appreciate it.

For what it's worth, this was not clear at all in the ES documentation. Maybe I missed it.

That said, I do seem to recall someone in an ES User's list posting (maybe you) saying something about the .security index being created when you use esusers to create a new user (or role). For what it's worth, I tried that, but the .security index was never created upon the creation of a new user. I tried this a number of times, with a number of new users, and always observed log output saying the .security index did not exist (like what was shown in the log output in my original post).

The only thing that brought the .security index into existence for me was when I manually created it myself.

Thanks,
Brian

The esusers tool creates users in the file realm which doesn't use and thus doesn't depend on and won't create the .security index.

You can add users in the native realm as Jay suggested above and this should trigger the creation of the .security index

Ioannis Kakavas wrote:

The esusers tool creates users in the file realm which doesn't use and thus doesn't depend on and won't create the .security index.

You can add users in the native realm as Jay suggested above and this should trigger the creation of the .security index

Ah, maybe that's where I've misunderstood things. I didn't realize that when Jay said "create a user or role using the REST api" that that would result in a user that is 'not the same as' (different realm than) the user I was creating with esusers.

Maybe that's also the issue when I run with PKI?

That is, I created certs with CN=myUser and then used esusers to add a user with the same name having admin privileges. But any query executed using the myUser certs encounters an authorization failure (although it authenticates successfully). Should I instead be creating the myUser using the REST API?

The certs (and their CN username) are associated with the PKI realm, but the user created via the REST API will be in the native realm. Is that the correct model?

It's my understanding that the username specified in the CN is what gets authenticated when a query is executed using those certs. But I'm not clear on how the privilege gets mapped to the user that is authenticated via PKI. I thought I was clear on it -- using the role_mapping.yml file -- but does the realm also play a role in how the desired privileges are mapped to the user specified in the PKI CN?

Does an entry need to be added to elasticsearch.yml for the native realm? In a way similar to the entries for the pki and file realms?

Thanks for your patience,
Brian

I think there might be a small formatting issue that can fix this:

admin: 
  - "CN=myUser, OU=example.org.unit, O=example.org, L=example.city, ST=example.state, C=US"

The role_mapping.yml file uses an array for the DNs that get mapped to a role.

Brian Murphy wrote:

admin: "CN=myUser, OU=example.org.unit, O=example.org, L=example.city, ST=example.state, C=US"

Jay Modi wrote:

I think there might be a small formatting issue that can fix this:
admin:

admin: 
  - "CN=myUser, OU=example.org.unit, O=example.org, L=example.city, ST=example.state, C=US"

I wish this were the problem. But my role_mapping.yml file has the exact same formatting as what you have shown above.

I believe there was either a cut-and-paste error or the reply tool did some sort of auto-reformatting. Either way, the entry in the role_mapping.yml file that I'm using is formatted the same as what you suggest above rather than what was shown in my previous posting.

Is it possible that the issue is related to my creating the myUser user with the esusers utility rather than with REST API?

That is, after setting up the PKI certs with CN=myUser as above, I do the following:

bin/shield/esusers/useradd myUser -r admin -p myPswd

bin/shield/esusers/useradd list
myUser : admin

But as Ioannis Kakavas explained in a prior posting, when the myUser is created using esusers as above, it creates the user in the file realm; whereas the REST API creates users in the native realm. Since my authentication is being performed in the PKI realm, should I be using the REST API to create myUser in the native realm?

Thanks,
Brian

You do not need to create the PKI user at all using the file or native realm. The PKI realm does not interact with the file or native realm; the logs show that the issue is a failure with mapping the user to a role.

Can you shutdown the node, set the log level for shield to TRACE, move the existing log file to a new name, start elasticsearch up, then attempt to authenticate with your PKI user, and share the full contents of this file? This will create a log file that contains a single startup and authentication attempt, which will hopefully help us understand what is going on.

Jay Modi wrote:

Can you shutdown the node, set the log level for shield to TRACE , move the existing log file to a new name, start elasticsearch up, then attempt to authenticate with your PKI user, and share the full contents of this file?

Yes, will do.

But just so you know, I've been currently setting only select shield loggers to TRACE because the log output was so large that this reply tool would not allow that many characters to be cut-and-pasted into a reply (imposes a 7000 chars limit). So I'll try to include the whole log, but I may have to snip out some info.

Anyway, I'll reply in a little bit.

Thanks,
Brian

Maybe you can upload it to github as a gist or another paste service and put the link here?

Jay Modi wrote:

Can you shutdown the node, set the log level for shield to TRACE , move the existing log file to a new name, start elasticsearch up, then attempt to authenticate with your PKI user, and share the full contents of this file?
Maybe you can upload it to github as a gist or another paste service and put the link here?

At your suggestion, I added a gist (see below); as the full log file far exceeds the character limit.

Command line used to start ES:

bin/elasticsearch -Dnetwork.host=myHost.company.com --cluster.name my-es-cluster --node.name myHost

The contents of the log file can be viewed at the link:

Note that near the end, I interleaved the PKI query that failed (because the roles array for the user is empty), along with the failure message displayed in the command window and the log records corresponding to that failure.

The log record that says that jumps out at me is,

[2019-02-21 07:13:11,111][DEBUG][shield.authc.support ] [myHost] the roles [], are mapped from the user [pki] for realm [CN=myUser, OU=example.org.unit, O=example.org, L=example.city, ST=example.state, C=US/pki]

Based on the entry in the role_mapping.yml file, I would have expected that the roles array should contain the admin role and not be empty. So I can't figure out what's missing in my setup that would cause the admin role to not be assigned to the myUser.

Thanks,
Brian