Get all results at once in Elasticsearch SQL syntax


(Ben Du) #1

I have a simple SQL query in Elasticsearch which I know returns less than 100 rows of results. How can I get all these results at once (i.e., without using scroll)? I tried the limit n clause but it works when n is less than or equal to 10 but doesn't work when n is great than 10.


(Florian Kelbert) #2

According to the documentation, you should be able to use LIMIT ALL. Does this work?


(Ben Du) #3

I tried. It didn't work. It still returned 10 rows of result. Perhaps I should file a ticket on this?


(Nik Everett) #4

If you fo it in the SQL console you'll get everything. I'd you do it in jdbc you'll get everything. If you do it over the http API you need to use the cursor that it returns. It is documented on the SQL API page. I'm on mobile so no link, sorry. Jdbc and the SQL cli all use the cursor mechanism over the http API.


(Andrei Stefan) #5

@legendu how do you run the query and, also, please share the query command (the full JSON if it's the REST API, or java code if it's JDBC). Thanks.


(Ben Du) #6

@nik9000
Thank you very much for the explanation! I used the http API. Now everything makes sense to me.

But how can I use the scroll API to get all results from a SQL query using http API? The scroll API returns scroll_ids while the SQL query returns a cursor ID. I tried to feed the cursor ID returned by a SQL query into a scroll API but it failed to work.


(Ben Du) #7

@Andrei_Stefan Please see my Python code for querying ES using requests below.

import requests
import json

url = 'http://10.204.61.127:9200/_xpack/sql'
headers = {
   'Content-Type': 'application/json',
}
query = {
   'query': '''
      select
          date_start,
          sum(spend) as spend
       from
           some_index
       where
           campaign_id = 790
           or
           campaign_id = 490
       group by
           date_start
   '''
}
response = requests.post(url, headers=headers, data=json.dumps(query))

(Andrei Stefan) #8

@legendu the cursor ID you get from the SQL API needs to be feed back in the same SQL API to get the next batch of results. Snippet from our documentation:

You can continue to the next page by sending back the cursor field. In case of text format the cursor is returned as Cursor http header.

POST /_xpack/sql?format=json { "cursor": "sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWYUpOYklQMHhRUEtld3RsNnFtYU1hQQ==:BAFmBGRhdGUBZgVsaWtlcwFzB21lc3NhZ2UBZgR1c2Vy9f///w8=" }


(Ben Du) #9

@Andrei_Stefan Thank you, Andrei! I had a try but it still gave me at most 10 rows of results. The second call on the cursor returns no result at all. I'm sure that there are more than 10 rows of results in the index. Any idea what went wrong?

And BTW, do you know the equivalent native syntax of the SQL query above? I'm so to ask for this but I'm very new to ES (and that's the reason I tried the SQL API). I'd like to verify that the equivalent native query returns the right answer.


(Andrei Stefan) #10

Take the query above and send it to translate API: https://www.elastic.co/guide/en/elasticsearch/reference/master/sql-translate.html. It will give the native ES query that you can use to test.


(Ben Du) #11

I tried the native query returned by the SQL translate API on the SQL query with limit all. However, it still returns only 10 rows of results. How can I make it return all results (less than 100 rows in total)?

import requests
import json


url = 'http://10.204.61.127:9200/some_index/some_doc/_search'
headers = {
    'Content-Type': 'application/json',
}
query = {
    "size": 0,
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "campaign_id.keyword": {
                            "value": 790,
                            "boost": 1.0
                        }
                    }
                },
                {
                    "term": {
                        "campaign_id.keyword": {
                            "value": 490,
                            "boost": 1.0
                        }
                    }
                }
            ],
            "adjust_pure_negative": True,
            "boost": 1.0
        }
    },
    "_source": False,
    "stored_fields": "_none_",
    "aggregations": {
        "groupby": {
            "composite": {
                "size": 1000,
                "sources": [
                    {
                        "2735": {
                            "terms": {
                                "field": "date_start",
                                "missing_bucket": False,
                                "order": "asc"
                            }
                        }
                    }
                ]
            },
            "aggregations": {
                "2768": {
                    "sum": {
                        "field": "spend"
                    }
                }
            }
        }
    }
}
response = requests.post(url, headers=headers, data=json.dumps(query)).json()

(Andrei Stefan) #12

If you used that ES DSL and you got back 10 records... maybe that is what you have in the index that matches that query. Please, note that the query uses "size": 10 which means the results will contain the aggregations results only, which is expected. Also, the composite aggregation there has "size": 1000 which means the result will contain first 1000 buckets.

So, if you say there are more than 10 rows you need to get back, this query (and its aggregation) should get you back all of it. To me, this means you have 10 buckets to be returned. Or maybe you mixed up number of documents with number of buckets? (you are using a group by in there, that's why you have "size": 0 and a composite aggregation in the query)


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.