Now I understand the question.
So it's not related to FSCrawler but more a general question on how I can extract a phone number from a text, right?
I mean that FSCrawler is responsible to extract the text from a PDF.
Once done, you can do whatever with the extracted text.
Here I'd probably try to use an ingest pipeline (which you can define later in FSCrawler with Elasticsearch settings — FSCrawler 2.10-SNAPSHOT documentation) to try to apply some regex on your text.
You can try the Grok processor may be: Grok processor | Elasticsearch Guide [8.11] | Elastic
If you have further questions, please provide an example of what you tried so far, without using FSCrawler. As I said, that's not FSCrawler's responsability doing that. Like (but for another use case):
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description" : "parse multiple patterns",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["%{FAVORITE_DOG:pet}", "%{FAVORITE_CAT:pet}"],
"pattern_definitions" : {
"FAVORITE_DOG" : "beagle",
"FAVORITE_CAT" : "burmese"
}
}
}
]
},
"docs":[
{
"_source": {
"message": "I love burmese cats!"
}
}
]
}