Thanks for pointing out that deficiency.
I agree it's an issue and have opened [ML] What to do about lang_ident for empty strings and numbers? · Issue #81933 · elastic/elasticsearch · GitHub to discuss what to do.
In the short term you could add an extra set
processor after the lang_ident inference
processor that changes the predicted language field to en
if the source field is an empty string. This set
processor would use an if
so that it only overrides the prediction for empty strings.