[GitHub Connector] Help on setting up the sync

I am trying to run GitHub Connector (v8.11) on Elastic Search, v8.11.3, and sync data from a GitHub Enterprise Server, specifically one repo under and organization.

I am having trouble figuring out how to configure advanced settings for the connector in Kibana. I got the connector deployed, and now see the following fields for additional configuration:

GitHub data source, where I can toggle between Cloud and Server option.
For GitHub Server option, I also get GitHub URL input field, where I add the base URL of our GH server.
Then there is token, and this is where I am unsure - personal access token (as documentation suggests) is attached to a user account. Where am I supposed to specify the user account? I don't see an option for this? Does it go into the URL?

I tried our-enterprise-server/service-account_with-token for the GitHub URL, but I get an error:

ClientResponseError: 406, message='Not Acceptable', url=URL('our-enterprise-server/login?return_to=our-enterprise-server/service-account_with-token/api/graphql')

Can anyone help me out and explain what is the expected format here?

*I've removed the protocol (or schema) part of the URL, because otherwise it does not let me post)

@anna-safonov you hsouldn't need to do anything to specify an account. As you said, the token is associated with the account - just by providing the token, Github's APIs will know who is behind those requests and respond accordingly.

The Github URL should just be the hostname you'd put in your browser to go to the home of your Github Enterprise Server.

Does that help clarify?

Hm, that does make sense, but is not working for me.

I tried providing just the base URL and the personal token which I create from a service account on the Enterprise server, but I am getting a 401.

However, I now realize that I am running the connector client from the source code on a VM where I am signed in under my account. Should I run the connector from the service account on this VM? Is it possible that it is grabbing username from the current login and trying to match it to the token?

The Github user on your VM shouldn't have anything to do with how the Connector authenticates to your Github Enterprise Server.

Common "gotchas" with github tokens:

  • you picked an expiration date for the token that is already passed
  • you didn't pick the right scopes for your token (you need repo, user, and read:org)
  • you didn't click "Configure SSO" and authorize one of your SSO organizations (only applicable if your GHES is secured with SSO)

Can you check those? If none of those is the issue, can you share any logs or more of the 401 error message?

Hey Sean,

Yes, to test token validity, I created a token under my own account (I have full admin access to the instance, and the org/repo I am trying to sync).
I do see it being used for authentication when I try to sync.
The scope is correct.
We don't have SSO enabled.

I am able to use the token with curl to auth to graphql.

Here is a debug log I see on the connector client side
It does not provide much information other than 401 (I've redacted some info):

[BYOC][15:55:17][DEBUG] > GET /.elastic-connectors/_doc/CONNECTOR_ID HTTP/1.1

Accept: application/vnd.elasticsearch+json; compatible-with=8
Authorization: Basic
Connection: keep-alive
User-Agent: elastic-connectors-python-8.11.5.0
X-Elastic-Client-Meta: es=8.8.0,py=3.10.0,t=8.4.0,ai=3.8.6
< HTTP/1.1 200 OK
< Content-Length: 4498
< Content-Type: application/vnd.elasticsearch+json;compatible-with=8
< X-Elastic-Product: Elasticsearch
< {"_index":".elastic-connectors-v1","_id":"CONNECTOR_ID","_version":75,"_seq_no":76,"primary_term":3,"found":true,"source":{"api_key_id":"REDACTED","configuration":{"repositories":{"depends_on":[x],"display":"textarea","tooltip":"This configurable field is ignored when Advanced Sync Rules are used.","default_value":null,"label":"List of repositories","sensitive":false,"type":"list","required":true,"options":[],"validations":[],"value":"ORG_NAME/REPO_NAMEl","order":4,"ui_restrictions":[]},"ssl_ca":{"depends_on":[{"field":"ssl_enabled","value":true}],"display":"text","tooltip":null,"default_value":null,"label":"SSL certificate","sensitive":false,"type":"str","required":true,"options":[],"validations":[],"value":"","order":6,"ui_restrictions":[]},"ssl_enabled":{"depends_on":[],"display":"toggle","tooltip":null,"default_value":null,"label":"Enable SSL","sensitive":false,"type":"bool","required":true,"options":[],"validations":[],"value":false,"order":5,"ui_restrictions":[]},"retry_count":{"display_value":3,"depends_on":[],"display":"numeric","tooltip":null,"default_value":null,"label":"Maximum retries per request","sensitive":false,"type":"int","required":false,"options":[],"validations":[],"value":3,"order":7,"ui_restrictions":["advanced"]},"host":{"depends_on":[{"field":"data_source","value":"github_server"}],"display":"text","tooltip":null,"default_value":null,"label":"GitHub URL","sensitive":false,"type":"str","required":true,"options":[],"validations":[],"value":"OUR_ENTERPRISE_URL","order":2,"ui_restrictions":[]},"use_text_extraction_service":{"depends_on":[],"display":"toggle","tooltip":"Requires a separate deployment of the Elastic Text Extraction Service. Requires that pipeline settings disable text extraction.","default_value":null,"label":"Use text extraction service","sensitive":false,"type":"bool","required":true,"options":[],"validations":[],"value":false,"order":8,"ui_restrictions":["advanced"]},"data_source":{"depends_on":[],"display":"dropdown","tooltip":null,"default_value":null,"label":"GitHub data source","sensitive":false,"type":"str","required":true,"options":[{"label":"GitHub Cloud","value":"github_cloud"},{"label":"GitHub Server","value":"github_server"}],"validations":[],"value":"github_server","order":1,"ui_restrictions":[]},"token":{"depends_on":[],"display":"text","tooltip":null,"default_value":null,"label":"GitHub Token","sensitive":true,"type":"str","required":true,"options":[],"validations":[],"value":"TOKEN_THAT_IS_VALID","order":3,"ui_restrictions":[]}},"custom_scheduling":{},"description":null,"error":"AuthenticationException: AuthenticationException(401, 'None')","features":{"incremental_sync":{"enabled":false},"document_level_security":{"enabled":false},"sync_rules":{"advanced":{"enabled":true},"basic":{"enabled":true}}},"filtering":[{"active":{"advanced_snippet":{"created_at":"2024-01-23T16:29:13.830Z","updated_at":"2024-01-23T16:29:13.830Z","value":{}},"rules":[{"created_at":"2024-01-23T16:29:13.830Z","field":"","id":"DEFAULT","order":0,"policy":"include","rule":"regex","updated_at":"2024-01-23T16:29:13.830Z","value":".*"}],"validation":{"errors":[],"state":"valid"}},"domain":"DEFAULT","draft":{"advanced_snippet":{"created_at":"2024-01-23T16:29:13.830Z","updated_at":"2024-01-23T16:29:13.830Z","value":{}},"rules":[{"created_at":"2024-01-23T16:29:13.830Z","field":"","id":"DEFAULT","order":0,"policy":"include","rule":"regex","updated_at":"2024-01-23T16:29:13.830Z","value":".*"}],"validation":{"errors":,"state":"valid"}}}],"index_name":"search-github-metrics","is_native":false,"language":null,"last_access_control_sync_error":null,"last_access_control_sync_scheduled_at":null,"last_access_control_sync_status":null,"last_incremental_sync_scheduled_at":null,"last_seen":"2024-01-23T20:51:44.418136+00:00","last_sync_error":"AuthenticationException: AuthenticationException(401, 'None')","last_sync_scheduled_at":null,"last_sync_status":"error","last_synced":"2024-01-23T20:44:28.898207+00:00","name":"github-metrics","pipeline":{"extract_binary_content":true,"name":"ent-search-generic-ingestion","reduce_whitespace":true,"run_ml_inference":false},"scheduling":{"access_control":{"enabled":false,"interval":"0 0 0 * * ?"},"full":{"enabled":false,"interval":"0 0 0 * * ?"},"incremental":{"enabled":false,"interval":"0 0 0 * * ?"}},"service_type":"github","status":"error","sync_now":false,"last_indexed_document_count":0,"last_deleted_document_count":0}}

Oh, and forgot to add,the token is set to never expire.

@anna-safonov I was hoping to look at your logs from the connectors process, like:

[FMWK][11:10:15][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Executing full sync
[FMWK][11:10:15][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Filtering validation started
[FMWK][11:10:15][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Collecting local document ids
[FMWK][11:10:15][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Iterating on remote documents
[FMWK][11:10:15][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Fetching repos
[FMWK][11:10:15][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Fetching configured repos: '['test-org/perf-repo']'
[FMWK][11:10:15][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Fetching repo: 'test-org/perf-repo'
[FMWK][11:10:16][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Sync progress -- created: 0 | updated: 0 | deleted: 0
[FMWK][11:10:16][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Fetching pull requests from 'test-org/perf-repo' with response_key '('repository', 'pullRequests')' and filter query: 'None'
[FMWK][11:10:16][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Fetching issues from repo: test-org/perf-repo with response_key: '('repository', 'issues')' and filter_query: 'None'
[FMWK][11:10:19][INFO] [Connector id: -Yi4N40Bt_UZdJQdmpbn, index name: search-github, Sync job id: zYFzPI0BV5Sl9akZFY7o] Sync progress -- created: 100 | updated: 0 | deleted: 0

The above just came from me trying to reproduce your issue, which I was unable to do. I'll attach some screenshots to see if that can help us find where your setup differs from mine.



Yes, my token token scope and expiration date are setup correctly.
My Kibana configuration looks a bit different, I think because you are running a connector with DLS already implemented, while I am running from branch 8.11 to ensure compatibility with our elasticsearch version (8.11.3).

So I only have the following fields present:

GitHub data source
github_server
GitHub URL
GITHUB_URL
GitHub Token


List of repositories
ORG_NAME/REPO_NAME
Enable SSL
false

Here are the logs from connector service:
[FMWK][21:49:32][INFO] Running connector service version 8.11.5.0
[FMWK][21:49:32][INFO] Loading config from /home/vmadmin/connectors/connectors/../config.yml
[FMWK][21:49:32][INFO] Running preflight checks
[FMWK][21:49:32][INFO] Waiting for NodeConfig(scheme='https', host='devops-es-ingest-2', port=9200, path_prefix='', headers={}, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={}) (so far: 0 secs)
[FMWK][21:49:32][INFO] Extraction service is not configured, skipping its preflight check.

[FMWK][21:49:32][INFO] Job Scheduling Service started, listening to events from httpsDevOps-Es-ingest-2:9200
[FMWK][21:49:32][INFO] Job Execution Service started, listening to events from httpsDevOps-Es-ingest-2:9200
[FMWK][21:50:34][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: sR2GPo0B9lTZZu6bUxcH] Executing full sync
[FMWK][21:50:34][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: sR2GPo0B9lTZZu6bUxcH] Filtering validation started
[FMWK][21:50:34][ERROR] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: sR2GPo0B9lTZZu6bUxcH] AuthenticationException(401, 'None')
[FMWK][21:50:35][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: sR2GPo0B9lTZZu6bUxcH] Sync ended with status error -- created: 0 | updated: 0 | deleted: 0 (took 1 seconds)

I also looked into logs produced by our GHES, and I see that authentication was successful and graphql POST requests from the connector have status 200, so it's a mystery to me why it shows up as 401 on the connector end. Any ideas what might be causing it?

Just in case it matters, we are running GHES 3.9.

Just some additional information, I've tried a couple more things to troubleshoot:

  • a token with super access (all scope items checked out) produces the same error;
  • a token with missing scope (just repo scope included) also produces the same error;

GHES, VM where the connector is deployed and our elasticsearch are all within our network, so there shouldn't be any firewall issues (and the connector is able to connect to ELK afterall).

Hey @Sean_Story,

I've omitted the stack trace from the connector service logs I posted above, so just including the logs with stack trace now:

[FMWK][09:53:56][INFO] Loading config from /home/vmadmin/connectors/connectors/../config.yml
/home/vmadmin/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/__init__.py:395: SecurityWarning: Connecting to 'https://devops-es-ingest-2:9200' using TLS with verify_certs=False is insecure
  _transport = transport_class(
[FMWK][09:53:56][INFO] Running preflight checks
[FMWK][09:53:56][INFO] Waiting for NodeConfig(scheme='https', host='devops-es-ingest-2', port=9200, path_prefix='', headers={}, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={}) (so far: 0 secs)
[FMWK][09:53:56][INFO] Extraction service is not configured, skipping its preflight check.
[FMWK][09:53:56][INFO] Job Scheduling Service started, listening to events from https://DevOps-Es-ingest-2:9200
[FMWK][09:53:56][INFO] Job Execution Service started, listening to events from https://DevOps-Es-ingest-2:9200
[FMWK][09:54:29][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: Pt4dQY0BnQKOS5hlHe6B] Executing full sync
[FMWK][09:54:29][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: Pt4dQY0BnQKOS5hlHe6B] Filtering validation started
[FMWK][09:54:30][ERROR] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: Pt4dQY0BnQKOS5hlHe6B] AuthenticationException(401, 'None')
Traceback (most recent call last):
  File "/home/vmadmin/connectors/connectors/sync_job_runner.py", line 140, in execute
    await self._execute_content_sync_job(job_type, bulk_options)
  File "/home/vmadmin/connectors/connectors/sync_job_runner.py", line 200, in _execute_content_sync_job
    await self.elastic_server.prepare_content_index(
  File "/home/vmadmin/connectors/connectors/es/sink.py", line 669, in prepare_content_index
    exists = await self.client.indices.exists(
  File "/home/vmadmin/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/indices.py", line 1180, in exists
    return await self.perform_request(  # type: ignore[return-value]
  File "/home/vmadmin/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 389, in perform_request
    return await self._client.perform_request(
  File "/home/vmadmin/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 320, in perform_request
    raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.AuthenticationException: AuthenticationException(401, 'None')
[FMWK][09:54:31][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: Pt4dQY0BnQKOS5hlHe6B] Sync ended with status error -- created: 0 | updated: 0 | deleted: 0 (took 2 seconds)

I took a look at sync_job_runner.py, lines 140 and 200, and synk.py line 669, and they look to be related to processing data in bulk on elastic - not sure how that translates to HTTP 401.

@Sean_Story
Ok, I realize I'm having a conversation with myself here, but I just set the connector service logs to DEBUG and have additional useful info. The issue is definitely not with GitHub, it's something to do with the index:

[FMWK][12:47:42][DEBUG] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: BA-7QY0BNUrBorZpvOeA] Successfully connected to GitHub.
[FMWK][12:47:42][DEBUG] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: BA-7QY0BNUrBorZpvOeA] SyncOrchestrator connecting to https://DevOps-Es-ingest-2:9200
[FMWK][12:47:42][DEBUG] Host is NodeConfig(scheme='https', host='devops-es-ingest-2', port=9200, path_prefix='', headers={}, connections_per_node=10, request_timeout=10.0, http_compress=False, verify_certs=True, ca_certs=None, client_cert=None, client_key=None, ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={})
[FMWK][12:47:42][DEBUG] Connecting with an API Key (dDFjb...)
[FMWK][12:47:42][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: BA-7QY0BNUrBorZpvOeA] Executing full sync
[FMWK][12:47:42][INFO] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: BA-7QY0BNUrBorZpvOeA] Filtering validation started
[FMWK][12:47:42][DEBUG] {'created_at': '2024-01-23T16:29:13.830Z', 'field': '_', 'id': 'DEFAULT', 'order': 0, 'policy': 'include', 'rule': 'regex', 'updated_at': '2024-01-23T16:29:13.830Z', 'value': '.*'} validation result (Validator: BasicRuleAgainstSchemaValidator): valid
[FMWK][12:47:42][DEBUG] {'created_at': '2024-01-23T16:29:13.830Z', 'field': '_', 'id': 'DEFAULT', 'order': 0, 'policy': 'include', 'rule': 'regex', 'updated_at': '2024-01-23T16:29:13.830Z', 'value': '.*'} validation result (Validator: BasicRuleNoMatchAllRegexValidator): valid
[FMWK][12:47:42][DEBUG] Basic rules set: '['DEFAULT']' validation result (Validator: BasicRulesSetSemanticValidator): valid
[FMWK][12:47:42][DEBUG] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: BA-7QY0BNUrBorZpvOeA] Filtering validation result: FilteringValidationState.VALID
[FMWK][12:47:42][DEBUG] Preparing the content index
[FMWK][12:47:42][DEBUG] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: BA-7QY0BNUrBorZpvOeA] Checking index search-github-metrics
[FMWK][12:47:43][ERROR] [Connector id: K1cnN40B9lTZZu6bhhvn, index name: search-github-metrics, Sync job id: BA-7QY0BNUrBorZpvOeA] AuthenticationException(401, 'None')
Traceback (most recent call last):
  File "/home/vmadmin/connectors/connectors/sync_job_runner.py", line 140, in execute
    await self._execute_content_sync_job(job_type, bulk_options)
  File "/home/vmadmin/connectors/connectors/sync_job_runner.py", line 200, in _execute_content_sync_job
    await self.elastic_server.prepare_content_index(
  File "/home/vmadmin/connectors/connectors/es/sink.py", line 669, in prepare_content_index
    exists = await self.client.indices.exists(
  File "/home/vmadmin/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/indices.py", line 1180, in exists
    return await self.perform_request(  # type: ignore[return-value]
  File "/home/vmadmin/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 389, in perform_request
    return await self._client.perform_request(
  File "/home/vmadmin/connectors/lib/python3.10/site-packages/elasticsearch/_async/client/_base.py", line 320, in perform_request
    raise HTTP_EXCEPTIONS.get(meta.status, ApiError)(
elasticsearch.AuthenticationException: AuthenticationException(401, 'None')

I set up the index by navigating to Search>Content>Elasticsearch indices> Create new index> Use connector and picked GitHub&GitHub Enterprise Server, named it (index name starting with search-) and then was taken to the Configuration page, where the API key is generated, etc.
Is there a step or a config I am missing?

Did you then set that API key in your connectors config.yml?
You're right, looking at that stack trace, the 401 is coming from Elasticsearch. It looks like the connector_id: K1cnN40B9lTZZu6bhhvn may be configured with an api_key that does not have access to the search-github-metrics index.

Can you go back to that configuration tab and generate a new API key and just replace whatever is configured now with the new API key, and see if that gets you farther?

Hey @Sean_Story,

Yeah, the issue was with how set up the config.
I put the api_key for the connector section, but I used username/password in the Elasticsearch section, and this was causing the issue.

I am able to connect and sync data now, thank you for your help :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.