PII security with Elastic.Clients.Elasticsearch 9.2.1

Good day

I’m not sure if I’m posting this in the right place, but here goes.

I’m trying to implement an ES index with an object that has mainly numeric fields, and some text fields. I want to be able to search by any field, and only return the numeric ones as the others are personally identifying data that I neither need in my results, nor want to expose to anyone else.

Trying to find documentation on such for C# has been a bit of a wild goose chase, often ending up with me at documentation that may have worked in older clients, but doesn’t seem to be supported any more.

What is the correct way to set up an index such that some fields will never be returned, but can be searched-on? Or, if that’s not possible, how do I set up masking per field with Elastic.Clients.Elasticsearch 9?

Welcome @Davyd_McColl

I’m not sure if I’m posting this in the right place, but here goes.

That's definitely the right place to ask. :slight_smile:

I want to be able to search by any field,

So you need to send the full document as a JSON with all the fields

and only return the numeric ones as the others are personally identifying data that I neither need in my results, nor want to expose to anyone else.

There are multiple ways for solving this.

At search time

One way I think is to control that at search time with source filtering but may be you want something else.

Exclude from source (mapping)

See _source field | Reference

Using mapping (stored fields)

Another solution could be to disable the _source field, and for "non viewable fields" set them to store: false. That way you can only fetch the other stored fields.

There are probably other ways...

Hope this helps.

Hi

Thanks for that - it does help.

When I attempted to use .Store(false) on a mapping, like so:

client.Indices.Create<Document>(index =>
  index.Index("the-index")
    .Mappings(m =>
       m.Properties(p =>
         p.Text(o => o.FirstName, k => k.Store(false))
...

Then I still see all the fields when browsing data via Elastron, and they all come back via a query. So I’m not sure if I’m just doing this wrong, or if there’s an issue here. In tests, this index is recreated from scratch for a test, so whatever my setup for the index is at the time the test is run, that’s it - so there’s not an old index hanging about with all fields enabled, for example.

However, when I use:

client.Indices.Create<CallCenterCustomerIndexItem>(index =>
  index.Index("the-index")
    .Mappings(m =>
       m.Source(s =>
         s.Includes("id", "orderId", "callCenterId"...)

Then I find that the fields are restricted as I would expect (name fields come back null). The only problem I have here is that I’d like to use nameof for the properties - but that gives back PascalCase, where the document is being stored with camelCase properties. Ideally, I’d like to do something more robust than simply lower-casing the first letter of the PascalCased property names to get to camelCased ones - I’d rather specify the field name somewhere, and I’m quite sure it’s possible, but again, I’m hitting a wall trying to find documentation on the subject with v9+ of the library. When I used nameof originally, all fields came back with default values, and that’s what gave me the idea, after looking at the documents, to camelCase properties for Includes, but I’d like to be deterministic here, especially to cover the case where someone renames a field on the POCO without realising that would affect mapping.

What should I be doing to set names for fields?