Postgresql connector skipping query

Hi, my connector is correcly connected to elasticsearch but in the log ther e is this line :

Skipping query select [...] as primary key column name is not present in query.

I don't understand if it's related to elastic or postgesql

Hi @mrtl ,

What version of connectors are you using? The behavior here changed in [PostgreSQL] Add new field id_columns in sync rules to generate id and adding support for views by parthpuri-elastic · Pull Request #2681 · elastic/connectors · GitHub, which looks like it was released in 8.15.0.

This behavior is from our Postgres Connector. Basically, the advanced sync rule logic previously required that you include primary keys in your custom queries, otherwise we didn't know how to identify the resulting records. In the newer versions, we let you specify how the results of your queries should be uniquely identified.

hi,

The log message indicates that the query being executed is missing a primary key column, which is essential for uniquely identifying records. This often occurs when the query does not include a column that Elasticsearch uses to identify individual documents.

Solution:

  1. Ensure your query includes the primary key field, which in Elasticsearch is typically the _id field or another unique identifier specified in your schema.
  2. Verify that your Elasticsearch index mapping includes a unique identifier for the documents and that your query is correctly referencing this field.

thanks.

My query includes severals primary keys. Does it mean I need to ad an _id alias direcly in the select clause ?

I'm not sure how to define a field as unique identifier in the mapping.

I was in version 8.14.
I upgraded to 8.15 and tried to include 'id_columns' but the query is still skipped

Skipping query select "internalId" from sheets for tables sheets as primary key column or unique ID column name is not present in query.

My sync rules (kept as simple as possible for testing purpose) :

[
  {
    "tables": [
      "sheets"
    ],
    "query": "select \"internalId\" from sheets",
    "id_columns": [
      "internalId"
    ]
  }
]

@mrtl thanks, this extra info is helpful.

It looks like the issue is now coming from quoting select \"internalId\", as opposed to just select internalId. Do you need the quotes for your column name? If so, can you try doing:

    "id_columns": [
      "\"internalId\""
    ]

? The logic here is not particularly intelligent, and is just checking to make sure that the ID columns you identify are in the outputs, using string matching. And "internalId" != internalId.

Yes, I need those quotes, otherwise Postgres will grumble.

SQL Error [42703]: ERROR: column "internalId" does not exist
Hint: Perhaps you meant to reference the column "sheets.internalId".

Even with your solution or an alias without quotes it still not working.

To avoid the need of quotes, I added a field "internal_id" in which I copied the values from "internalId" and then it finally worked.

But the behavior is not what I was expected. Instead of adding the field "internal_id", it has added the field "public_sheets_internal_id" and some fields that I did not requested like "schema", "database", "_timestamp" and "table".

Is there a way to import fields as defined in the mapping ?

Oh bummer. That surprises me. Same error, or something different?

Yay! Glad you found a workaround.

We do the table-prefixing of the output field names because otherwise we'd be forcing you to have the same types between all tables that have the same field name. Like if you have an id field in a User table that's a string, but an id field in a Product table that's an auto-increment integer, and the Product table syncs first, Elasticsearch would expect all id field values going forward to also be numeric, and then would crash when you got to your User table. To work around this, we prefix fields with the table name to keep them distinct. You can use Ingest Pipelines to rename/move these fields if you'd prefer.

Yep, all our connectors are going to add as much metadata as possible, with the goal of facilitating search down the line. If you don't want these fields, you can drop them in your ingest pipeline.