Painless: Loop all fields inside a document to extract phonenumbers

I'm indexing my documents using a third-party piece of software, so I am kinda stuck to the mapping made by that piece of software.

Now I want to make some small adaptions, so I can search any phonenumbers inside the documents. I thought an ingest pipeline, which searches for phonenumbers in each field and put them in an array, would be an ideal solution for now. So each document would get a separate field with an array of all phonenumbers, available inside that document.

The painless script works if I hardcode each field, but documents are kinda variable and there might be a LOT of different fields. So I thought I could just loop each field and extract, using regex, all phonenumbers. But I am not able to loop through all fields:

PUT _ingest/pipeline/merge_phonenumbers
{
  "description": "Combine multiple phonenumbers fields in one array for aggregation",
  "processors": [
    {
      "script": {
        "source": """
          void loopAllFields(def x){
            for (int i = 0; i < x._fields.length; i++) {
              def phonenumber = /([0-9]+)/.matcher(x[i].data);
              if(phonenumber != null)
              {
                x.combined_phonenumbers.add(phonenumber);
              }
            }
          }
          
          if (ctx.combined_phonenumbers == null) {
              ctx.combined_phonenumbers = new ArrayList();
          } 
          
          loopAllFields(ctx);
        """
      }
    }
  ]
}

Any thoughts if it's possible to loop over all fields? I have to mention some fields might be nested objects, which is the reason I would like to create a recursive function.

EDIT:
I've made some progress:

PUT _ingest/pipeline/merge_phonenumbers
{
  "description": "Combine multiple phonenumbers fields in one array for aggregation",
  "processors": [
    {
      "script": {
        "source": """
          void loopAllFields(def x){
            if(x instanceof Map){
              for (entry in x.entrySet()) {
                if (entry.getKey() == "_source") { 
                  continue;
                }
                if(entry instanceof Map || 
                   entry instanceof ArrayList ||
                   entry instanceof HashMap)
                {
                  loopAllFields(entry);
                  continue;
                }
                def phonenumber = /[0-9]+/.matcher(entry.getValue());
                x.combined_phonenumbers.add(phonenumber);
              }
            }
          }
          
          if (ctx.combined_phonenumbers == null) {
              ctx.combined_phonenumbers = new ArrayList();
          } 
          
          loopAllFields(ctx);
        """
      }
    }
  ]
}

This function does loop but gives me an exception:

ScriptException[runtime error]; nested: ClassCastException[Cannot cast java.util.ArrayList to java.lang.CharSequence];

Got a working solution, forgot I had to typecast the value to a string first before extracting the numbers:

PUT _ingest/pipeline/merge_fields
{
  "description": "Look multiple fields for numbers and put them in one array for aggregation",
  "processors": [
    {
      "script": {
        "source": """
          def loopAllFields(def x, def list){
            if(x instanceof Map){
              for (entry in x.entrySet()) {
                if (entry.getKey() == "_source") { 
                  continue;
                }
                
                if(entry instanceof Map || 
                   entry instanceof ArrayList ||
                   entry instanceof HashMap)
                {
                  loopAllFields(entry, list);
                }

                def s = entry.getValue().toString();
                def m = /[0-9]{10,20}/.matcher(s);
                
                while(m.find()) 
                {
                  def keyword = m.group(0).trim();
                  list.add(keyword);
                }
              }
            }
            return list;
          }

          def l = new ArrayList();
          l = loopAllFields(ctx, l);
          if (ctx.combined == null) {
              ctx.combined = l;
          } 
        """
      }
    }
  ]
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.