My cluster has roughly 300,000 documents in it. Right now, I'm only able to query 10,000 at a time. Is there a way to query through all documents? Or is this bad practice?
Many of search queries are returning 0 results because they do not exist within the first 10,000 results. What is my best option here?
Also, is there a way to return the total number of documents in the cluster? I know it's about 300,000 but I really don't know.
There are a few questions there lets start to sort them out.
First for the Total Number of documents
GET /_cluster/stats
Statistics on each Index
GET _cat/indices/?v
You can search across billions or trillions of documents in Elasticsearch but results sets may be limited, so I am a bit unclear what you are asking.. certainly you can search across 300K documents.
And you can "page" through many results if needed... see here
And Aggregations have sometimes have some limitations to the number of documents that are used in calculating the aggregations but I don't think you are asking that.
So let's take a look, Can you show us what your search query looks like? are you using the Query DSL or Discover? and perhaps we can help
I am just doing a basic match query. Return docs with field Process that match. However, it only ever returns 10,000 results, never more. And many results show up empty
I don't need more than 10,000 results. If I search for an item, and it does not return, does that mean it does not exist in the index? Does each query search through the entire index?
That is correct assuming your query is correctly formed And you are searching across an index or an index pattern that represents the data that you want to search against.
Yes assuming you have not applied filter like a time filter or a range filter or something that limits the search scope.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.