What's best production setup for handling 1 billion records?

sohilelasticsearch · February 29, 2016, 1:00pm

Hi ,

I want to load 1 billion documents in elasticsearch.
I am using ES 1.7.1.
Format of each document is
{
"pid":1234,
"sid":123,
"dname":"abc.com"
}

Is it fine to load all the data into single ES node with all default settings?

warkolm · February 29, 2016, 10:31pm

Depends on a few things, you really should try to see if it fits on a node of whatever size you have.

sohilelasticsearch · March 1, 2016, 5:45am

I am running ES with ES_HEAP_SIZE=16g and diskspace is 500G.

warkolm · March 1, 2016, 5:53am

Have you tried loading the data into a node with those specs?
What was the indexing and query response like?

sohilelasticsearch · March 1, 2016, 9:42am

I have made one change in specs, I have set index.number_of_shards: 1 and index.number_of_replicas: 0 .

Query is taking ~15 sec but CPU utilization is around 400%.

warkolm · March 1, 2016, 9:43am

What sort of query is it?

sohilelasticsearch · March 1, 2016, 9:58am

I am using spring-data-elasticsearch 1.3.0.RELEASE

Most of the queries are
@Query("{"bool" : {"should" : [{"query_string" : { "query" : "?0", "fields":["dname"]}} , {"query_string" : { "query" : "?1", "fields":["dname"]}} ]}}"")

for eg.

return all documents where dname="abc.com" or dname="abc.com*"

Also

logical and on pId AND sid combination.

{"bool":{"must":[{"query_string":{"query":"123","fields":["pid"],"default_operator":"and"}},{"query_string":{"query":"456","fields":["sid"],"default_operator":"and"}}]}}

for eg. return all documents where pid=X AND sid=Y

sohilelasticsearch · June 14, 2016, 5:00am

Can anyone help me on this ?

warkolm · June 14, 2016, 6:10am

A leading wildcard query like that will always be slow, it's essentially the ES version of a table scan.

sohilelasticsearch · June 14, 2016, 7:47am

Hi @warkolm,

My query is not start with leading wildcard, It's trailing wildcard query.

Thanks,
Sohil

warkolm · June 14, 2016, 7:50am

The ? is leading though. As per Query String Query | Elasticsearch Guide [2.3] | Elastic it'll only look for a single char in there, but it still needs to scan a lot of docs to return just those.

sohilelasticsearch · June 14, 2016, 7:57am

I am using spring-data-elasticsearch.

below is the code snippet.

@Query("{"bool" : {"should" : [{"query_string" : { "query" : "?0", "fields":["dname"]}} , {"query_string" : { "query" : "?1", "fields":["dname"]}} ]}}"")
Page findByDomainOrDomainStartsWith(String domain , String domain1 , Pageable pageable);

"?0" -> domain
"?1" -> domain1

warkolm · June 14, 2016, 8:09am

Ah ok, my misunderstanding then. Sorry!

Topic		Replies	Views
Single node, large database index performance Elasticsearch	9	584	June 23, 2021
Regarding production configuration of elasticsearch Elasticsearch	8	422	July 6, 2017
Looking for advice on bulk loading Elasticsearch	6	885	July 6, 2017
ES with one node Elasticsearch	3	343	July 24, 2019
ES document compression and node configuration Elasticsearch	2	532	April 16, 2017

What's best production setup for handling 1 billion records?

Related topics