Ingesting documents (.pbix) to elasticsearch

so I would like to create an application which would get pbix files insert those files into elastic and then we could create a rest application which could query elastic for full text search over those pbix files.

however, I have problems in putting pbix files into elastic. does anyone have any idea how to do it?

to be more precise does someone knows how to extract that data from .pbix files into some sort of document that can be stored in elastic?
i would like to be able to do full text search on the pbix files content

i have only seen guides on how to do the oppsite(i.e use elastic data in power bi)

What are pbix files?

https://fileinfo.com/extension/pbix says:

A PBIX file is a document created by Power BI Desktop, a Microsoft application used to create reports and visualizations. It contains queries, data models, visualizations, settings, and reports added by the user.

I'm not sure what @Yuval_David would actually want to index from this content. I mean that it does not contain data as per say.

but it does contain text so i would like to index all the text that appers in the pbix file

Ok, then you will need to figure out a way to extract that from the file. It's not something that is native to the Elastic Stack.

Adding that I have no idea if FSCrawler supports it. May be give it a try?

i am not sure that it is supported by FTCrawler i even tried to look, thought that maybe someone who dealt with this kind of files know how to extract that data and ingest it into elastic

FSCrawler supports whatever you have in that list:

If I understand correctly, pbix files are XML files, you should be able to read it with whatever parser (can be logstash if you wish) and transform the content to a json document which is then sent to Elasticsearch.

Not sure it will make sense though. But may be you can share somewhere a typical pbix document so we can look at it? And tell us exactly what you want to index from this document.

i can but it wont help you much its an compressed file.
you can download one here Sales & Returns sample report.
(taken from here : https://docs.microsoft.com/en-us/power-bi/create-reports/sample-datasets)
but since its compressed i can quite figure out where to take the data from, you can chance the extention to .zip and then extract the file but i still cant figure much out

So what kind of content do you want to index from this file?

as i said when you open this kind of file with the power bi, you get some sort of a report, with text inside it -> thats what i want to index

Can you share a screen capture of the file you shared, opened in PowerBI which shows the text you would like to see indexed in elasticsearch?

as you can see there are serval tabs etc... alot of text/numerical values to index (i dont care if its all stored as one big text field

So you want to index something like:

{
   "content": "Category Breakdown Power BI $52K Word $36K OneNote $31K PowerPoint $30K ... Store Breakdown Fama $40K Contoso $39K .... "
}

That's it?

I gave a try with Apache Tika (via FSCrawler) and here is what has been extracted:

https://gist.githubusercontent.com/dadoonet/57e0e2d2eebc379249f3ea75c6e991da/raw/4050aad09d68fbbfeb0b3ab71d1074cbcb9377d9/extracted.txt

Not sure it helps to index that type of content.

So I don't think there is a solution out of the box here. You probably need to build something by yourself.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.