General queries

I have following generic queries:

  1. What is the impact on heap memory allocated to ES during the following operations
    a) Indexing
    b) Querying

For example if I index 1 M records or 100M records what will be impact on heap memory.

If I want to index 500 million records, with each record consisting of 15 fields around 350 GB of data, how much heap memory will be required?

  1. The data returned by "_nodes/stats" API gives a different value for "store": {
    "size_in_bytes": } then the actual size of data folder of ES. Why is it different?

  2. I have observed, after indexing the size of data keeps on reducing on the disk. Is some kind of compression going on the indexed data on the disk?

  3. How many replicas are generally recommended in elasticsearch?

  1. This is really something you need to test with your data.
  2. That maybe things like doc values and other things, but not really sure.
  3. Yes, that is merging of segments, which removes deleted documents.
  4. At least one, more is entirely up to you.