Elasticsearch record aggregation in Power BI

wendt · December 12, 2017, 3:25pm

Hi there!

I'm hoping someone here will have encountered a similar problem. I'm trying to query the Elasticsearch API to pull data into Power BI for visualization. What I quickly realized is that with larger datasets (>100k records), the Elasticsearch API return limit of 10k records became a barrier. The obvious solution was to implement a scrolling query. Unfortunately, Power BI did not understand how to handle this, and it would overwrite the records pulled. To clarify, if I pulled records in 5k chunks, it would see all 100k records, but only store the last 5k seen.

Someone on the Power BI forums solved this with a recursive script that, to the best of my understanding, only writes the records once a counter limit has been reached, or once all matching records have been returned. Recursive script here.

I had to modify this slightly, because my scroll_id was too large to be passed as a header or in the URL, but rather needed to be passed in the body. That brings us to the script shared below. It works, with a limit. The first time Elasticsearch is queried and the scroll session is established, I get a unique set of records. All subsequent queries (10, in this case), return a second set of 10k records.

Does anyone have suggestions as to why this is happening? I'm inclined to think this is on the side of how I'm interacting with the Elasticsearch API, but I'm open to other ideas.

Please let me know if you have any clarifying questions!

## example values for parameters passed to the function on initial call
url = http://localhost:9200/example_index/_search?scroll=1m
scrollURL = http://localhost:9200/_search/scroll
urlBody = {"size":10,"query":{"match_all":{}}}
scrollBody = {"scroll":"10m","scroll_id":"
scrollID = [null]
counter = [null]
scrollEND = "}

Script to recursively query elasticsearch through use of a scrolling query

let
	## Defining function
    RecursiveElasticFetchJSON = (url, scrollURL, urlBody, scrollBody, scrollID, counter, scrollEND) =>

    let
        Counter = if (counter = null) then 0 else counter,

        ## If function is called w/o a scrollID, then make the initial query, otherwise, query with a scrolling query
        Results = if (scrollID = null) then
            Json.Document(Web.Contents(url,[Content = Text.ToBinary(urlBody)]))
        else
            Json.Document(Web.Contents(scrollURL,[Content = Text.ToBinary(scrollBody&scrollID&scrollEND)])),

        ## Parsing results for later processing
        ParsedResults = Table.FromList(Results[hits][hits], Splitter.SplitByNothing(), null, null, ExtraValues.Error),

        ## Aggregates results and calls the function again, or returns the results
        Return = if (Counter < 10) then
            ParsedResults & RecursiveElasticFetchJSON(url, scrollURL, urlBody, scrollBody, Results[_scroll_id], Counter+1, scrollEND)
        else
            ParsedResults
    in
        Return
in
    RecursiveElasticFetchJSON

Ron_3 · December 13, 2017, 6:10pm

Ya I have this problem too.. HELP!!!!

wendt · December 14, 2017, 2:11pm

Replying to close this out for anyone else who might encounter this in the future, Power BI caches the results from repeat calls to the same API. To work around this problem, you can add a unique value to the header each time you call out. Using the counter value already specified in the script is a good way to achieve this.

Credit to KenvM over on the Power BI forums for providing the solution.

Where to add the header value:
Json.Document(Web.Contents(scrollURL,[Content ==Text.ToBinary(scrollBody&scrollID&scrollEND), **Headers=[MyHeader=Text.From(Counter)]**])),

system · January 11, 2018, 2:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Returning more than 10000 records in Power BI Elasticsearch	2	1597	October 3, 2017
Extract all data from a composite aggregation in Power BI/Power Query using after_key Elasticsearch	1	208	May 29, 2023
How to get all data from elasticsearch to Power bi Elasticsearch	2	67	July 31, 2024
Elastic search fails to index the records once it reaches 10000 Elasticsearch	7	3130	October 15, 2019
How to implement pagination with large dataset Elasticsearch language-clients , async-search	5	2145	April 21, 2022

Elasticsearch record aggregation in Power BI

Related topics