Difference between regular induces and logsdb (datastreams)

Hi,
I recently decided to move my k8s cluster logs from regular induces to logsdb type of datastreams.
For my induces now i use:

  • dynamic mapping for all string fields to keyword (except message field, which i use for text search)
  • ignore event.original field (as it is same to message field)
  • default compression codec

Official docs say that logsdb save more space due to new mechanisms and so.
So i made some tests with my log-generating app. All the logs are same.

So my test results for 1kk logs:

  • Regural index with dynamic mapping applied (as it works for me now): 2.52gb
  • Logsdb datastream with default settings and mappings: 2.83gb
  • Logsdb datastream with my dynamic mappings applied: 1.92gb
  • Regural index with dynamic mapping and best_compression (as in logsdb settings) applied: 1.91gb

Now it seems to me that i don't get any andvantages in using logsdb (which actually gives less flexibility in naming for example).
So my question is: is there only difference between regular induces and logsdb in default mapping of all fields to keyword and best_compression?

Hello and welcome,

Do you have a license or are using the trial license?

The logsdb mode also requires the synhetic_source, if you do not have a license or are not using the trial license it will use the normal _source, so the saving in space will not be that high.

Also, I think your dataset may be too small to do this comparison and see any savings.

You need to get into tens or hundreds of GB to see differences.

In my experience I had something from 30% to 50% reduction after start using logsdb with synthetic source.

1 Like

Thanks for the note about synthetic_source. As I checked it does not support dynamic mapping and text search. Is there a way to use text search?

I use opensource Elastic Cloud on Kubernetes v8.17.0 (no licensing).

Hmmm not sure where you got text not supported.