How to softly exclude ambiguous words?

Using scripts can achieve this, but the performance is too poor. I have hundreds of billions of data entries, and the storage is at the PB level.