Create a new column with the hash of an existing column

I have an elasticsearch index to which I want to add a new column whose value is going to be the hash(sha256 or md5) of an existing column. Manually doing it isn't feasible as the index has more than 600,000,000 documents and doing it manually would take more than 2-3 months (according to a simple calculation done for 200,000 documents).

Any help is deeply appreciated.

I think it should be clarified that Elasticsearch has no concept of columns, but rather fields. That's not intended to be pedantic but instead shared with the goal of enabling easier communication about features.

I think I understand what you're asking but please correct me if I'm wrong. Are you asking how you might add a new field whose value will be based on the value of another field? Do you need to do this on an ongoing basis for new data incoming, or only to apply this change to pre-existing data?

To apply to pre-existing data, I think the easiest approach might be reindex using an ingest pipeline using a pipeline that includes a script processor to produce a new field using a script like ctx.my_field_sha = ctx.my_field.sha256() (according to this PR that .sha256 method should be available).

If you need the hash value on an ongoing basis for new documents, I would strongly consider computing the hashed field in the client code. You could also use the same ingest pipeline mentioned above, but note that there are limitations with ingest pipelines such as incompatibility with update API.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.