Invalid field name: _allow_permissions,_deny_permissions

I am trying to implement WorkplaceSearch.
Since I want to include online documents used in the company in the search, I am setting up a custom source by referring to the following url.

However, when I run the following command, an error occurs.

$ curl -X POST http://XXX.XXX.XXX.XXX:3002/api/ws/v1/sources/[ID]/documents/bulk_create \
-H "Authorization: Bearer [TOKEN]" \
-H "Content-Type: application/json" \
-d '[
  {
    "_allow_permissions": ["admin", "user"],
    "_deny_permissions": [],
    "id" : 1234,
    "title" : "my Online Document",
    "body" : "this is body.",
    "url" : "http://my/online/document/url",
    "created_at": "2021-05-28T12:00:00+00:00",
    "type": "list"
  },
]'

{"results":[{"id":"1234","errors":["Invalid field name: _allow_permissions,_deny_permissions"]}]}

What should I change?

Hey @its-ogawa,

I'm surprised to see it complaining about those field names. What you're trying matches pretty close to our documentation on permissions for custom sources.

What version of Workplace Search are you running?

Ross

Ah, sorry, I should've spotted this earlier. Document level permissions in Workplace Search require a Platinum level license. I know the error message isn't very helpful, and we intend to improve that.

You can see a license breakdown here: Elastic Workplace Search | Elastic. Search for the term "Platinum" on that page to jump straight to the relevant part.

I see. I see that I need a higher level license!
There is a possibility that I will apply for a platinum license in the future, but at the moment, I would like to use a basic license since I am in the evaluation stage.

In this case, should I simply exclude _allow_permissions and _deny_permissions?
Please let me know the correct way to do this for the basic license.

I tried to run it without the above properties.

curl -X POST http://XXX.XXX.XXX.XXX:3002/api/ws/v1/sources/[ID]/documents/bulk_create \
-H "Authorization: Bearer [TOKEN]" \
-H "Content-Type: application/json" \
-d '[.
  {
    "id" : 1234,
    "title" : "my Online Document",
    "body" : "this is body.",
    "url" : "http://my/online/document/url",
    "created_at": "2021-05-28T12:00:00+00:00",
    "type": "list"
  },
]'
{"results":[{"id": "1234", "errors":[]}]}

Does the fact that the value of errors is empty mean that it is successful?
It was added to the source content on WorkplaceSearch!

However, when I search for keywords in WorkplaceSearch for the words listed in the URL in question, there are no hits. Can this help me search for keywords in our online documentation?

Hey @its-ogawa,

Yes, you can simply omit the _allow_permissions and _deny_permissions fields. An empty errors array is fine.

In addition to indexing the URL of your online documentation, can you confirm you're including the documentation content in the body field as well? Indexing documents in this way will not automatically read the content at the URL provided, but rather rely on you to supply the searchable content in the body field.

Ross

What do you mean by including document content in the body field?

For example, if I were to write an online document in markdown format, would all the content in the markdown file correspond to the body?
Also, do I need to register as many of those markdown files as I have in WorkplaceSearch?

Or is there a better way to utilize it?
For example, would it be easier to use the Git integration feature of WorkplaceSeach?
In the case of Git integration, we are hoping that the search target will not only be the file name, but also the text.

It’s best to think of these objects as symbols, which represent the greater document to which they link. Fields should be descriptive enough that your users will be able to find what they need given loose querying attempts.

Does this expression in the document mean that the body should contain a "summary" of the link?
And that "summary" needs to be hand-crafted by the user.

Is that what you mean?

When indexing documents into a Custom API source, you'll need to index every document you want to make searchable, and you'll need to include all searchable content in the body field for each indexed document. If your body content is in markdown format, it should index just fine without having to convert to plaintext.

I suspect you're using GitLab from another question, but if you're using Github you can make use of the first-party Github connector.

I have confirmed that the body description can be in markdown format.
Thank you very much.

However, it seems to be very time consuming to register all the documents.
It would be great if registration in this area could be made a little easier.

One thing I noticed is that when writing body, if there is a newline, it gives a syntax error.
I replaced the line feed with a space and successfully registered it, but the notation when searching is a little difficult to read.
What should I do to maintain the appearance?
Also, is there any other way to do this?

Or do you think that the body is not so concerned about the appearance since it only shows the area around the keywords when searching?

Workplace Search shouldn't have an issue with indexing documents with newlines. Perhaps the environment you're using to index the documents is making it more difficult to include request bodies with newlines in them?

In case it helps, there are a few client libraries for interacting with Enterprise Search's API: Elastic Enterprise Search API clients for Ruby and Python | Elastic Blog.

Great!

"Create a custom source in Elastic Workplace Search with Python".
I found this article very helpful.

The following manual described several methods other than documentation, which I tried.

However, in my environment, some of them do not work.
Do you know why?
Am I misunderstanding the meaning of document ID?

The source code is shown below.

from elastic_enterprise_search import WorkplaceSearch

WORKPLACE_SEARCH_URL = [REDACTED]
HTTP_AUTH = [REDACTED]
CONTENT_SOURCE_ID = [REDACTED]

workplace_search = WorkplaceSearch(WORKPLACE_SEARCH_URL)
workplace_search.http_auth = HTTP_AUTH

Cases that work.

res = workplace_search.index_documents(
    http_auth=HTTP_AUTH,
    content_source_id=CONTENT_SOURCE_ID,
    documents=[
      {
        # "_allow_permissions": ["permission1"], # require platinum license  
        # "_deny_permissions": [],               # require platinum license
        "id" : 1234,
        "title" : "The Meaning of Time",
        "body" : "Not much. It is a made up thing.",
        "url" : "https://example.com/meaning/of/time",
        # "created_at": "2019-06-01T12:00:00+00:00",
        "type": "list"
      },
      {
        # "_allow_permissions": [],             # require platinum license
        # "_deny_permissions": ["permission2"], # require platinum license
        "id" : 1235,
        "title" : "The Meaning of Sleep",
        "body" : "Rest, recharge, and connect to the Ether.",
        "url" : "https://example.com/meaning/of/sleep",
        # "created_at": "2019-06-01T12:00:00+00:00",
        "type": "list"
      }
    ]
)

Cases where things go wrong.

res = workplace_search.get_document(
    # http_auth=HTTP_AUTH,
    content_source_id=CONTENT_SOURCE_ID,
    # document_id="1234",#NG
    document_id=1234,#NG
    # document_id=[1234],#NG
)
res = workplace_search.get_content_source(
    content_source_id=CONTENT_SOURCE_ID,
)
res = workplace_search.list_content_sources()

All have the following error message

raise HTTP_EXCEPTIONS.get(status, APIError)(
elastic_transport.exceptions.NotFoundError: [404] {'error': 'Routing Error. The path you have requested is invalid.'}

Keep in mind that the auth token for content source and document management will be different from the auth token used for searching. This code works for me:

from elastic_enterprise_search import WorkplaceSearch

WORKPLACE_SEARCH_URL = 'http://localhost:3002'
HTTP_AUTH = '[REDACTED]' # "Access Token" from the content source details page

workplace_search = WorkplaceSearch(
  WORKPLACE_SEARCH_URL,
  http_auth = HTTP_AUTH
)

print(
  workplace_search.list_content_sources()
)

print(
  workplace_search.get_content_source(
    content_source_id='60ba8e0ea1c493ea314d2688'
  )
)

print(
  workplace_search.get_document(
    content_source_id='60ba8e0ea1c493ea314d2688',
    document_id='park_acadia'
  )
)

Sadly, it does not work in my environment.

For example.

print(
  workplace_search.list_content_sources()
)

When you do this, it will look like this

$ python ConnectWorkplaceSearch.py
GET http://XXX.XXX.XXX.XXX:3002/api/ws/v1/sources/60b05b4384c21216271163d0/documents/1234 [status:404 request:0.089s]
Traceback (most recent call last):
  File "ConnectWorkplaceSearch.py", line 120, in <module>
    res = workplace_search.get_document(
  File "C:\Users\ogawa\Anaconda3\envs\py38\lib\site-packages\elastic_enterprise_search\client\_workplace_search.py", line 331, in get_document
    return self.perform_request(
  File "C:\Users\ogawa\Anaconda3\envs\py38\lib\site-packages\elastic_enterprise_search\client\_base.py", line 187, in perform_request
    return self.transport.perform_request(
  File "C:\Users\ogawa\Anaconda3\envs\py38\lib\site-packages\elastic_transport\transport.py", line 311, in perform_request
    resp_status, resp_headers, data = connection.perform_request(
  File "C:\Users\ogawa\Anaconda3\envs\py38\lib\site-packages\elastic_transport\connection\http_urllib3.py", line 251, in perform_request
    self._raise_error(
  File "C:\Users\ogawa\Anaconda3\envs\py38\lib\site-packages\elastic_transport\connection\base.py", line 192, in _raise_error
    raise HTTP_EXCEPTIONS.get(status, APIError)(
elastic_transport.exceptions.NotFoundError: [404] {'error': 'Routing Error. The path you have requested is invalid.'}

I think this is a pre-python problem, since accessing the URL in the error message does not yield any documentation.
I am not able to specify the domain properly in EnterpriseSearch settings (ent_search.external_url) or NGINX settings, as you have pointed out in another thread.
I know I should try this after this is resolved, but I would like to ask a few questions here as well.

  • Can WORKPLACE_SEARCH_URL be a public IP?

  • Is content_source_id the ID of the content source details page (CREDENTIALS)?

  • Is document_id the value of the "id" of the registered document? (1234 or 1235 in the example above or in the official documentation)

  • Is it possible to check the document_id above in the WorkplaceSearch browser?

Is there a way to list content and documents in the curl command?
This is to know the content id and document id to use here.

Can WORKPLACE_SEARCH_URL be a public IP?

I don't know of any reason why it couldn't be. I suspect there's something going on unrelated to Enterprise Search where the requests can't actually reach the specified IP/domain.

Is content_source_id the ID of the content source details page (CREDENTIALS)?

Yep!

Is document_id the value of the "id" of the registered document? (1234 or 1235 in the example above or in the official documentation)

Yep!

Is it possible to check the document_id above in the WorkplaceSearch browser?

There's no easy place to do this in the admin UI.

Is there a way to list content and documents in the curl command?

Yep, you'll want to use the search api: Search API Reference | Workplace Search documentation [8.11] | Elastic. Note that authenticating to the search API is different from authenticating to the admin API.

I have read the documentation.

However, I am unable to get the content list.
What is wrong?
With the python script mentioned above, I am able to register the document, so it is not that there is no content.

# curl -X GET 'http://localhost:3002/api/ws/v1/sources'
{"error":"Routing Error. The path you have requested is invalid."}

Assuming you're including your auth info when you actually make the request, I don't see anything wrong with that. Here's a slightly more simple version of the same request that works for me:

curl -u 'enterprise_search:password' http://localhost:3002/api/ws/v1/sources

Hmm. The command you gave me did not work for me either.
Even though EnterpriseSearch is indeed running and some contents are already registered.

# curl -u 'enterprise_search:[REDACTED]' http://localhost:3002/api/ws/v1/sources
{"error":"Routing Error. The path you have requested is invalid."}

Why is that? Is there anything you can think of?

Oh, what version of Workplace Search are you using? I think that sources API endpoint was added in 7.13.