One billion data from MySql imported into ElasticSearch, how ES performance？

Zhantong_Mou · March 2, 2015, 7:54am

I have one billion data in mysql. the data of mysql is like name, ID cards,
phone numbers. They are almost unique.
Whether the ElasticSearch based on Inverted index can ensure the speed of
queries?
Can we justify the numbers of shards improves the speed of the query?
If ES can replace MySql, how ES ensures the performance? I think the structured
data can not have good performance than Mysql, because that it based on Inverted
index.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · March 2, 2015, 9:03am

What are the queries, what is the speed of queries?

A growing number of shards is not related to speed, it is are related to
scalability. This means, even with large document count, the search
response time can be kept low by creating indices that span multiple nodes.
If you increase number of replica shards, you can serve more searches in
parallel.

ES can not replace MySQL, because ES is not a relational database system.

ES is faster than MySQL's direct index because ES queries can operate
in-memory. If you do not want inverted index, choose doc values.

Jörg

On Mon, Mar 2, 2015 at 8:54 AM, Zhantong Mou mztsmile@gmail.com wrote:

I have one billion data in mysql. the data of mysql is like name, ID
cards, phone numbers. They are almost unique.
Whether the Elasticsearch based on Inverted index can ensure the speed of
queries?
Can we justify the numbers of shards improves the speed of the query?
If ES can replace MySql, how ES ensures the performance? I think the structured
data can not have good performance than Mysql, because that it based on Inverted
index.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHxg_cpaZc06F9G_VD_vvB7LP-7pDsqef9sAHcoeOBW5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Zhantong_Mou · March 2, 2015, 12:28pm

Thank your answer. But I have other question.

One billion data from one table of mysql. MySql use B-tree or others

ensuring the response of queries.
If the data of one field is almost unique. the normal revert index can
not improve the query speed. ES is based on Lucene. The revert index have
relationship that items to docs. one billion items cannot improve
performances.
Lucene add items index base normal revert index. The items split into
many shards. but I think ES can not improve the speed in this case.

What is your opinion?

thanks

在 2015年3月2日星期一 UTC+8下午5:03:38，Jörg Prante写道：

What are the queries, what is the speed of queries?

A growing number of shards is not related to speed, it is are related to
scalability. This means, even with large document count, the search
response time can be kept low by creating indices that span multiple nodes.
If you increase number of replica shards, you can serve more searches in
parallel.

ES can not replace MySQL, because ES is not a relational database system.

ES is faster than MySQL's direct index because ES queries can operate
in-memory. If you do not want inverted index, choose doc values.

Jörg

On Mon, Mar 2, 2015 at 8:54 AM, Zhantong Mou <mzts...@gmail.com
<javascript:>> wrote:

I have one billion data in mysql. the data of mysql is like name, ID
cards, phone numbers. They are almost unique.
Whether the Elasticsearch based on Inverted index can ensure the speed
of queries?
Can we justify the numbers of shards improves the speed of the query?
If ES can replace MySql, how ES ensures the performance? I think the structured
data can not have good performance than Mysql, because that it based on Inverted
index.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/65aa1642-e037-4321-8743-9207497a346a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · March 2, 2015, 2:16pm

I do not have an answer because your question is speculative. Without
having facts about your MySQL query type and speed and your scenario, it is
not possible to discuss alternatives.

Fact is, MySQL is very limited, search is slow, it is not a search engine.
Lucene has plenty of advantages in search over RDBMS, not only inverted
indexing. As said, if you do not want inverted indexing, you can choose doc
values.

Jörg

On Mon, Mar 2, 2015 at 1:28 PM, Zhantong Mou mztsmile@gmail.com wrote:

Thank your answer. But I have other question.
One billion data from one table of mysql. MySql use B-tree or others
ensuring the response of queries.
If the data of one field is almost unique. the normal revert index can
not improve the query speed. ES is based on Lucene. The revert index have
relationship that items to docs. one billion items cannot improve
performances.
Lucene add items index base normal revert index. The items split into
many shards. but I think ES can not improve the speed in this case.

What is your opinion?

thanks

在 2015年3月2日星期一 UTC+8下午5:03:38，Jörg Prante写道：

What are the queries, what is the speed of queries?

A growing number of shards is not related to speed, it is are related to
scalability. This means, even with large document count, the search
response time can be kept low by creating indices that span multiple nodes.
If you increase number of replica shards, you can serve more searches in
parallel.

ES can not replace MySQL, because ES is not a relational database system.

ES is faster than MySQL's direct index because ES queries can operate
in-memory. If you do not want inverted index, choose doc values.

Jörg

On Mon, Mar 2, 2015 at 8:54 AM, Zhantong Mou mzts...@gmail.com wrote:

I have one billion data in mysql. the data of mysql is like name, ID
cards, phone numbers. They are almost unique.
Whether the Elasticsearch based on Inverted index can ensure the speed
of queries?
Can we justify the numbers of shards improves the speed of the query?
If ES can replace MySql, how ES ensures the performance? I think the structured
data can not have good performance than Mysql, because that it based on Inverted
index.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65aa1642-e037-4321-8743-9207497a346a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65aa1642-e037-4321-8743-9207497a346a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEVeUCT9saUzUq2%3DsEOFqT7eAFtOqWe3gPpZk1zwpCeLQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Zhantong_Mou · March 3, 2015, 3:31am

My scenario:
one billion records. I query one record by one filed,and that filed is
unique.

How ES ensure the performance. What is the arithmetic? I am very confused.

在 2015年3月2日星期一 UTC+8下午10:16:53，Jörg Prante写道：

I do not have an answer because your question is speculative. Without
having facts about your MySQL query type and speed and your scenario, it is
not possible to discuss alternatives.

Fact is, MySQL is very limited, search is slow, it is not a search
engine. Lucene has plenty of advantages in search over RDBMS, not only
inverted indexing. As said, if you do not want inverted indexing, you can
choose doc values.
Elasticsearch Platform — Find real-time answers at scale | Elastic

Jörg

On Mon, Mar 2, 2015 at 1:28 PM, Zhantong Mou <mzts...@gmail.com
<javascript:>> wrote:
Thank your answer. But I have other question.
One billion data from one table of mysql. MySql use B-tree or others 
ensuring the response of queries.
If the data of one field is almost unique. the normal revert index
can not improve the query speed. ES is based on Lucene. The revert index
have relationship that items to docs. one billion items cannot improve
performances.
Lucene add items index base normal revert index. The items split into
many shards. but I think ES can not improve the speed in this case.

What is your opinion?

thanks

在 2015年3月2日星期一 UTC+8下午5:03:38，Jörg Prante写道：

What are the queries, what is the speed of queries?

A growing number of shards is not related to speed, it is are related to
scalability. This means, even with large document count, the search
response time can be kept low by creating indices that span multiple nodes.
If you increase number of replica shards, you can serve more searches in
parallel.

ES can not replace MySQL, because ES is not a relational database system.

ES is faster than MySQL's direct index because ES queries can operate
in-memory. If you do not want inverted index, choose doc values.

Jörg

On Mon, Mar 2, 2015 at 8:54 AM, Zhantong Mou mzts...@gmail.com wrote:

I have one billion data in mysql. the data of mysql is like name, ID
cards, phone numbers. They are almost unique.
Whether the Elasticsearch based on Inverted index can ensure the speed
of queries?
Can we justify the numbers of shards improves the speed of the query?
If ES can replace MySql, how ES ensures the performance? I think the structured
data can not have good performance than Mysql, because that it based on Inverted
index.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65aa1642-e037-4321-8743-9207497a346a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65aa1642-e037-4321-8743-9207497a346a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8cc946a2-31ed-405b-aa5c-9322d18299e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dennis · March 3, 2015, 4:07am

It may seem like a daunting task at first, but it really turns out not to be. Just install elasticsearch I'm about the same number of machines you would install MySQL and faxed all them both. Test it note test at once with a brand new set up tested after its been run 10 times cashing will take place. You can get faster results from a database under certain circumstances but you cannot get the variety of searching the elastic search offers

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0d7fbea1-1995-4954-b09d-be90a221483b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · March 3, 2015, 8:30am

Querying one-by-one record is an uncommon task for an RDBMS with
relationship management between tables. It is more a task for a key/value
store.

Nevertheless, MySQL is slow for key/value store scenario, there are faster
products, e.g. memcached, where MySQL offers an integration.

ES is faster than MySQL if you ramp up enough RAM and load the docs into
direct memory (I/O buffer) into the field data cache and optimize
configuration (plus balancing the work load between nodes) so ES can work
very close to an in-memory key/value store.

The complexity is O(1) * len(dict) for all term lookups. It means, lookup
does not depend on key count or on key length, but Lucene needs to scan the
term dictionary, unless you implement an FST for terms
[LUCENE-3069] Lucene should have an entirely memory resident term dictionary - ASF JIRA, see also
Changing Bits: Lucene now has an in-memory terms dictionary, thanks to Google Summer of Code
With FST, the complexity would be O(1) - roughly spoken. In practice, there
are more factors to consider (e.g. if the index is write-once or if it is
open for ongoing modifications, which requires frequent and expensive cache
rebuilding)

Next: doc values. By uninverting the field cache
Trifork Blog - Keep updated on the technical solutions Trifork is working on! you
will need vastly more memory and therefore you must search not only in
memory but also on-disk. If you want key lookup only, this is fast enough,
because no field cache (re)building is required. This is best for
frequently changing data.

Jörg

On Tue, Mar 3, 2015 at 4:31 AM, Zhantong Mou mztsmile@gmail.com wrote:

My scenario:
one billion records. I query one record by one filed,and that filed
is unique.

How ES ensure the performance. What is the arithmetic? I am very
confused.

在 2015年3月2日星期一 UTC+8下午10:16:53，Jörg Prante写道：
I do not have an answer because your question is speculative. Without
having facts about your MySQL query type and speed and your scenario, it is
not possible to discuss alternatives.

Fact is, MySQL is very limited, search is slow, it is not a search
engine. Lucene has plenty of advantages in search over RDBMS, not only
inverted indexing. As said, if you do not want inverted indexing, you can
choose doc values. http://www.elasticsearch.org/
guide/en/elasticsearch/guide/current/doc-values.html

Jörg

On Mon, Mar 2, 2015 at 1:28 PM, Zhantong Mou mzts...@gmail.com wrote:
Thank your answer. But I have other question.
One billion data from one table of mysql. MySql use B-tree or others
ensuring the response of queries.
If the data of one field is almost unique. the normal revert index
can not improve the query speed. ES is based on Lucene. The revert index
have relationship that items to docs. one billion items cannot improve
performances.
Lucene add items index base normal revert index. The items split
into many shards. but I think ES can not improve the speed in this case.

What is your opinion?

thanks

在 2015年3月2日星期一 UTC+8下午5:03:38，Jörg Prante写道：

What are the queries, what is the speed of queries?

A growing number of shards is not related to speed, it is are related
to scalability. This means, even with large document count, the search
response time can be kept low by creating indices that span multiple nodes.
If you increase number of replica shards, you can serve more searches in
parallel.

ES can not replace MySQL, because ES is not a relational database
system.

ES is faster than MySQL's direct index because ES queries can operate
in-memory. If you do not want inverted index, choose doc values.

Jörg

On Mon, Mar 2, 2015 at 8:54 AM, Zhantong Mou mzts...@gmail.com wrote:

I have one billion data in mysql. the data of mysql is like name, ID
cards, phone numbers. They are almost unique.
Whether the Elasticsearch based on Inverted index can ensure the
speed of queries?
Can we justify the numbers of shards improves the speed of the query?
If ES can replace MySql, how ES ensures the performance? I think the structured
data can not have good performance than Mysql, because that it based on Inverted
index.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/e61eb6fb-3f5a-40fc-974d-283d76030821%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/65aa1642-e037-4321-8743-9207497a346a%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65aa1642-e037-4321-8743-9207497a346a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8cc946a2-31ed-405b-aa5c-9322d18299e7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8cc946a2-31ed-405b-aa5c-9322d18299e7%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFq%2B7dQH4NMPWd7P0ukVywe2cGJg4pOMS%3DGiT6wZk6Uxw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch vs MySQL for large data set using exact value searching Elasticsearch	1	667	July 10, 2018
Upload Billions data to ES in less time Elasticsearch	7	1139	August 5, 2019
Getting started with elasticsearch Elasticsearch	1	268	July 6, 2017
Huge database I'm wondering if Elasticsearch can handle Elasticsearch	5	1161	July 5, 2017
Search over large documents Elasticsearch	5	343	March 11, 2019

One billion data from MySql imported into ElasticSearch, how ES performance？

Related topics