Questions about performance

Hello dear community :hugs:
Sorry for the possibly incorrect questions below. I'm just trying to learn the miracle of Elasticsearch :slight_smile:
I want to store ~10 billion records with data history of different cryptocurrencies.
I was advised to store them all in this form:

PUT /coin_charts/_doc/bitcoin-1642716000000
      "currency": "bitcoin",
      "priceUsd": 36000,
      "time": 1642716000000

There are questions around this:

  1. Is it normal to store so many records with such a structure? Farther away, very often it will be necessary to do searches on the field "time" and "currency" to obtain the value of a certain cryptocurrency for some time. For that, I issue this query:
POST /coin_charts/_search
    "size": 1,
    "query": {
        "bool": {
            "must": [
                {"term": {"currency": "bitcoin"}},
                {"range": {"time": {"gt": 1357110000000}}}
  1. I need to get information about different currencies for 100 completely different dates. I thought it would make sense to get this in one request. Is it possible to do this? If so, how can I modify my request (from question #1)? Since for each date now I have to perform a separate request.

  2. I noticed that the speed of adding new records is very slow. About 100-150ms. Is it normal? It seems to me that in version 5 the addition worked many times faster. Is it possible to speed up the addition of new records? Maybe there is some
    multi-threaded append processing or something similar?

  3. What are the optimal server characteristics for this, and perhaps additional settings need to be applied so that requests are processed very quickly?
    The main thing is speed :slight_smile:

  1. Yes, this is normal
  2. Normally you'd just pick a date range query and then use that. Look at using filters rather than queries - Query and filter context | Elasticsearch Guide [7.16] | Elastic
  3. Are you using the _bulk API? If not, what are you doing
  4. How fast?
1 Like

What you have is time-series data, and the recommended way to work with this in Elasticsearch is through aggregations. This will allow you to find e.g. minimum and maximum prices per currency per time interval, e.g. day. I would recommend reading this blog as well as this section in the docs.

For ingest performance look at this section in the docs and let. Elasticsearch assign document IDs if you can, as this improves ingest performance.

1 Like

I would also add that for this type of use case it may also be worthwhile considering a dedicated time-series database like e.g. InfluxDB or Prometheus as they offer different trade-offs and performance characteristics.

1 Like

I don't know how I haven't heard about _bulk before, this is a miracle for me :sweat_smile:
Thanks a lot for the tip.

I would like to process each request up to 30 ms, if possible. Do you need any settings changes on a powerful server? Or settings on such server are applied automatically?

Thanks for the advice, I'll definitely see how it works

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.