Analyzer to deal with padded 0s?

Hello,

If I have a field that has the possibility of padded 0's at the beginning.

so values could be for example:

000012
00X213
452

Is there an analyzer that I could use that would index/search that field as if the padded 0s dont exist?

so for instance if i index 000012 i could search as '000012' or '12'?

A pattern analyzer that has a regex to remove the padded zeros at search time and index time will do what you want.

{
   "settings":{
	  "analysis":{
		 "analyzer":{
			"padded_zero_removal":{
			   "type":"pattern",
			   "pattern":"^0+(?!$)"
			}
		 }
	    }
      }
}

You can use the Analyze API to test it, send the term '00000012':

curl -XGET 'localhost:9200/{yourIndex}/_analyze?analyzer=padded_zero_removal&pretty=true' -d '00000012'

The response.

{
  "tokens" : [ {
	"token" : "12",
	"start_offset" : 6,
	"end_offset" : 8,
	"type" : "word",
	"position" : 1
  } ]
}

thank you!

1 Like