Separate clauses while index in elastic search with FScrawler

Hi All,
I indexed a pdf file in elastic search with fscrawler (elastic version 7.1)
in kibana every thing is ok and I can see the index but the problem is all pdf file index as one content filed and when i search the whole of pdf content selected, i am looking for some thing that index pdf file sentence by sentence , can any body help me

You need to manage that by yourself I'm afraid. What is a sentence after all? You can may be write an ingest pipeline which does that on the text that has been extracted by the crawler. Script Processor might be a good candidate...

Hi dadoonet , It means word by word or clause by caluse some thing like this:

content : your guide to healthy eating use the food pyramid to plan meals and snacks healthy food. eating a wide variety of nourishing foods provides the energy and nutrients you need every day to stay healthy. plan what you eat using these tips. take time to plan your meals in advance.

I want add to index each clause separately, like this

content : your guide to healthy eating use the food pyramid to plan meals and snacks healthy food.

content: eating a wide variety of nourishing foods provides the energy and nutrients you need every day to stay healthy. plan what you eat using these tips.

content: take time to plan your meals in advance.

I understood that. Was just saying that you need to implement your own rules.

What about question marks? What about exclamation points? And other signs?

What is the use case then?

Yes exactly I want to do , separate by [. ! ?] for understanding of clause type. My plan is deploying a dictionary of clauses and my resources are only pdfs.

As I said you need to do that transformation by yourself.

1 Like

Many Thanks for your response();

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.