Converting excel/pdf to json format

Hi,
is there a way to convert arbitrary document (excel, pdf) formats into json format, giving some meaning semantically.

we have some legacy data which we need to organised into feature group name, feature name, value (like name-value).. but the data isn't following any specific format.
What is the best approach to do this mapping.
So far, have used fscrawler and ingested all the pdf/excel docs to elastic search..but not sure, how to define this mapping..
All the contents that we want is available in "content" with lots of \n\n etc., no specific structure (since it isnt defined)..

The data should fit some structure like this.. The data has more than 15 features each with 5 to 10 options..
[
feature {
name : "feature-1",
options: [
{ name : "test",
value: "A"

}
]
}
]

Appreciate your help..

Not really AFAIK.
ingest-attachment or FSCrawler are using Apache Tika behind the scene which "just" extracts content (flat text) and metadata.
If you need something specific, you probably need to implement something by yourself.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.