Can I parse text in pdf document before sending it to elasticsearch using FSCrawler

Now I understand the question.
So it's not related to FSCrawler but more a general question on how I can extract a phone number from a text, right?

I mean that FSCrawler is responsible to extract the text from a PDF.
Once done, you can do whatever with the extracted text.

Here I'd probably try to use an ingest pipeline (which you can define later in FSCrawler with Elasticsearch settings — FSCrawler 2.10-SNAPSHOT documentation) to try to apply some regex on your text.

You can try the Grok processor may be: Grok processor | Elasticsearch Guide [8.11] | Elastic

If you have further questions, please provide an example of what you tried so far, without using FSCrawler. As I said, that's not FSCrawler's responsability doing that. Like (but for another use case):

POST _ingest/pipeline/_simulate
{
  "pipeline": {
  "description" : "parse multiple patterns",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": ["%{FAVORITE_DOG:pet}", "%{FAVORITE_CAT:pet}"],
        "pattern_definitions" : {
          "FAVORITE_DOG" : "beagle",
          "FAVORITE_CAT" : "burmese"
        }
      }
    }
  ]
},
"docs":[
  {
    "_source": {
      "message": "I love burmese cats!"
    }
  }
  ]
}