Data Normalization

We have a dataset with 110 million documents, 20 million of which have a reading of "0". We suspect some hourly readings have been lost due to totals running 10% to 30% lower than expected and unfortunately it seems they've been recorded as "0" instead of null.

One thought I had was to detect all "0" readings and then detect if the reading before and after are non-zero. If so, average them to replace the "0" reading. I'm ok either running an in-place replacement or outputting to another index.

Has anyone done something similar to this and if so, any tips or suggestions on how to accomplish this would be appreciated.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.