Searching on a transformation/interpretation of a field

I have an index of 30m items; I indexed them using cURL reading a CSV from the command line, as suggested by an article I found on the ES blog. So did not use logstash. The index process took a few days. I do not expect the dataset to subsequently change.

I have since noticed one field has numbers but padded with 0 e.g. "0001", "0012", "0123", "1234". The field is a keyword.

This is causing problems using search with "simple_query_string" to try and match user typed queries where their typed values may appear across a multitude of fields.

What technique would allow me to either transform the data in place, or to have searches consider the field through some sort of filter that removed the leading 0 characters?

Welcome.

May be change the data type to make it an integer?
That won't change the source json and the way it's stored.

Hi David

I looked up some ways to change mapping types but it seemed all would require me to delete my index and re-import everything.

I have just finished a process whereby I ran:

POST /my_index/_update_by_query?wait_for_completion=false&conflicts=proceed
{
    "query": {
        "regexp":{
            "number": "[0-9]{4}"
        }
    },
    "script": {
        "lang": "painless",
        "source": "ctx._source.number_int = Integer.parseInt(ctx._source.number)"
    }
}

Which worked to add a new field to every record. I first tested by using a restrictive query block to check one ID at a time. When I tried to run with no query I got script errors; I couldn't work out why as it suggested some fields were blank, but searching for or eliminating these did not work. In the end the regex worked to prevent errors, and the task took about an hour to do 30m records, with relatively little impact to the system performance (though the system is not under production load yet).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.