Questions about performance

EnJay · January 26, 2022, 10:53pm

Hello dear community
Sorry for the possibly incorrect questions below. I'm just trying to learn the miracle of Elasticsearch
I want to store ~10 billion records with data history of different cryptocurrencies.
I was advised to store them all in this form:

PUT /coin_charts/_doc/bitcoin-1642716000000
{
      "currency": "bitcoin",
      "priceUsd": 36000,
      "time": 1642716000000
}

There are questions around this:

Is it normal to store so many records with such a structure? Farther away, very often it will be necessary to do searches on the field "time" and "currency" to obtain the value of a certain cryptocurrency for some time. For that, I issue this query:

POST /coin_charts/_search
{
    "size": 1,
    "query": {
        "bool": {
            "must": [
                {"term": {"currency": "bitcoin"}},
                {"range": {"time": {"gt": 1357110000000}}}
            ]
        }
    }
}

I need to get information about different currencies for 100 completely different dates. I thought it would make sense to get this in one request. Is it possible to do this? If so, how can I modify my request (from question #1)? Since for each date now I have to perform a separate request.
I noticed that the speed of adding new records is very slow. About 100-150ms. Is it normal? It seems to me that in version 5 the addition worked many times faster. Is it possible to speed up the addition of new records? Maybe there is some
multi-threaded append processing or something similar?
What are the optimal server characteristics for this, and perhaps additional settings need to be applied so that requests are processed very quickly?
The main thing is speed

warkolm · January 27, 2022, 2:26am

Yes, this is normal
Normally you'd just pick a date range query and then use that. Look at using filters rather than queries - Query and filter context | Elasticsearch Guide [7.16] | Elastic
Are you using the _bulk API? If not, what are you doing
How fast?

Christian_Dahlqvist · January 27, 2022, 3:54am

What you have is time-series data, and the recommended way to work with this in Elasticsearch is through aggregations. This will allow you to find e.g. minimum and maximum prices per currency per time interval, e.g. day. I would recommend reading this blog as well as this section in the docs.

For ingest performance look at this section in the docs and let. Elasticsearch assign document IDs if you can, as this improves ingest performance.

Christian_Dahlqvist · January 27, 2022, 4:04am

I would also add that for this type of use case it may also be worthwhile considering a dedicated time-series database like e.g. InfluxDB or Prometheus as they offer different trade-offs and performance characteristics.

EnJay · January 27, 2022, 11:24pm

I don't know how I haven't heard about _bulk before, this is a miracle for me
Thanks a lot for the tip.

I would like to process each request up to 30 ms, if possible. Do you need any settings changes on a powerful server? Or settings on such server are applied automatically?

EnJay · January 27, 2022, 11:26pm

Thanks for the advice, I'll definitely see how it works

system · February 24, 2022, 11:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Design advice for market data Elasticsearch	8	960	July 6, 2017
Write Performance at Scale Elasticsearch	5	389	June 21, 2018
Search time consistency Elasticsearch	3	307	July 6, 2017
Fetch million doc in seconds Elasticsearch	11	3568	August 3, 2020
Extracting fields in bulk - using ES as a data store Elasticsearch	4	550	July 6, 2017

Questions about performance

Related Topics