Default language for language detection

Thanks for pointing out that deficiency.

I agree it's an issue and have opened [ML] What to do about lang_ident for empty strings and numbers? · Issue #81933 · elastic/elasticsearch · GitHub to discuss what to do.

In the short term you could add an extra set processor after the lang_ident inference processor that changes the predicted language field to en if the source field is an empty string. This set processor would use an if so that it only overrides the prediction for empty strings.