Hi ,
I want to load 1 billion documents in elasticsearch.
I am using ES 1.7.1.
Format of each document is
{
"pid":1234,
"sid":123,
"dname":"abc.com"
}
Is it fine to load all the data into single ES node with all default settings?
Hi ,
I want to load 1 billion documents in elasticsearch.
I am using ES 1.7.1.
Format of each document is
{
"pid":1234,
"sid":123,
"dname":"abc.com"
}
Is it fine to load all the data into single ES node with all default settings?
Depends on a few things, you really should try to see if it fits on a node of whatever size you have.
I am running ES with ES_HEAP_SIZE=16g and diskspace is 500G.
Have you tried loading the data into a node with those specs?
What was the indexing and query response like?
I have made one change in specs, I have set index.number_of_shards: 1 and index.number_of_replicas: 0 .
Query is taking ~15 sec but CPU utilization is around 400%.
What sort of query is it?
I am using spring-data-elasticsearch 1.3.0.RELEASE
Most of the queries are
@Query("{"bool" : {"should" : [{"query_string" : { "query" : "?0", "fields":["dname"]}} , {"query_string" : { "query" : "?1", "fields":["dname"]}} ]}}"")
for eg.
return all documents where dname="abc.com" or dname="abc.com*"
Also
logical and on pId AND sid combination.
{"bool":{"must":[{"query_string":{"query":"123","fields":["pid"],"default_operator":"and"}},{"query_string":{"query":"456","fields":["sid"],"default_operator":"and"}}]}}
for eg. return all documents where pid=X AND sid=Y
Can anyone help me on this ?
A leading wildcard query like that will always be slow, it's essentially the ES version of a table scan.
Hi @warkolm,
My query is not start with leading wildcard, It's trailing wildcard query.
Thanks,
Sohil
The ?
is leading though. As per Query String Query | Elasticsearch Guide [2.3] | Elastic it'll only look for a single char in there, but it still needs to scan a lot of docs to return just those.
I am using spring-data-elasticsearch.
below is the code snippet.
@Query("{"bool" : {"should" : [{"query_string" : { "query" : "?0", "fields":["dname"]}} , {"query_string" : { "query" : "?1", "fields":["dname"]}} ]}}"")
Page findByDomainOrDomainStartsWith(String domain , String domain1 , Pageable pageable);
"?0" -> domain
"?1" -> domain1
Ah ok, my misunderstanding then. Sorry!
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.