Limitations of ElasticSearch & Kibana Combination

Hi everyone,

Im here as a 5 day old user of Elasticsearch + Kibana user and have given it a lot of time over this period. One thing which strikes me as a real limitation I am running into is that as a general rule of thumb elasticsearch is promoted as a primarily text focussed database if you will.

One thing that I find interesting is that for a text field your options in terms of reporting are:

  1. Display the text field as-is.
  2. Count the number of values in a field or the number of DB entries that have a particular data field value, count unique field values etc - the point being number related based on text.
  3. Display the LastValue.

Most of the offerings revolve around maths related outcomes (abs, min, max etc). There are no text related functions for text concatenation, extraction or in any way outputting something else if a particular text value is found. In short, while ES+Kibana are setup for text “storage” this combination is not setup for text “processing”.

Even the FORMULA option for display is limited to maths related counts or LastValue.

Anyway comments please…

Geoffrey Brown, Te Puke, New Zealand.

I opened a GitHub issue some time ago with suggestions: Beyond word clouds · Issue #95912 · elastic/kibana · GitHub

Although elasticsearch has some useful text features they didn’t make it into Kibana largely because I expect that Kibana is more focused on logs (its port 5601 is “logs” spelled backwards, after all). I expect text-focused apps tend to be more bespoke developments.

1 Like

Hi @geoffrey Welcome to the community!!!

That's a great topic/question, and I generally agree with your assessment, and @Mark_Harwood1 is a legend in this area :slight_smile:

Elasticsearch is primarily a data store and search engine, rather than a processing platform. However, you can perform some processing during ingestion (using ingest pipelines), through transforms (which are usually focused on aggregation and may not be very helpful), or on the query side using the DSL. That said, it can still be challenging to display or report the results effectively.

BUT i do need to add, have you looked at ESQL? There is a significant amount of text processing you can do and it can drive visualizations/tables as well

There are extraction functions, extensive string manipulation, text search/ match, and you can even call out to an LLM for completion/summary, etc.

These functions are Row by Row, ie, each document can be manipulated and then displayed if you like.

Those results can be fed into a table, or aggregated into a pie chart etc..

So perhaps take a look at that, still not a full-text processing engine, but quite a lot you can do!

OK, thanks for that Stephen. Will check out ES|QL.

Hi @Mark_Harwood1 ,
I see you have been getting into the nitty gritty of word clouds and I like the linkages proposed to provide context to output. Shortly after reading this was searching in youtube and minutes later I was watching one of your videos (The story of elasticsearch told using elasticsearch). It was only when I thought this is an interesting video that I looked down to see your name as the submitter of the video.

LOGS BACKWARDS
I have utilized most of my brain cells and see how 5601 spells logs backwards. Is this truly where the port 5601 came from ?
I myself would have used 5901 as then the case is consistent across the letters.

DB INTERFACE
I have to say that venturing down this rabbit hole has also exposed me to REST for the communications with the elasticsearch engine (using DEVTOOLS). It amuses me, having come from a MS SQL background (in part), how a HTML comms system was used to develop an interface for use by humans (REST) to extract data from a datastore. The brackets for a GET query for example drive me bonkers. I love the way SQL just almost came naturally as an extension of English. I guess having to do queries in this way seems painful especially as the syntax of SQL is well established. Yuk !

TERMINOLOGY
Last of all I also find the new terminology (indices, types, documents etc) interesting as can see the equivalence to a normal DB. At the end of the day these terms I presume come from the use of JSON for the underlying data storage ?

ELASTIC LIMITATIONS & BENEFITS
Things I have realized where not in the glossy brochure are that essentially you are dealing with one record type (index ?) at a time with no ability to perform the like of joins across indexes. This means that there is a real simplicity to its operation and its main call to fame seems to be the “inverted index” to allow quick text searches through the data. If you are not using this text search functionality much the limitations of the “text processing” become a limiting factor to its usefulness in simple data processing situations (what the one record type in index is suited for).

I have as yet to check out ESL that @stephenb has mentioned in a later post so will see where this takes me.

Geoffrey Brown, Te Puke, NZ

Hi @geoffrey

I hope you enjoy the journey with some new tech.

Elasticsearch is not an RDBMS; it is a laptop-to-petabyte-scale search and analytics engine. There are many differences, big and small, some will bring joy, others pain... but I hope you enjoy finding the differences.

While Elasticsearch has its limitations, it can also be remarkably...freeing ;).

Let me share a personal story about this:

About 10 years ago, in my previous role at a Data and Analytics company, I managed a sizable data team and data infrastructure. We relied on traditional databases like Oracle, MSSQL, along with one of my favorites, PostgreSQL... I loved PostgreSQL, I ran a huge farm for geospatial.

We encountered a new use case at the time ("Google-like experience, realtime type ahead, must match. At scale, ms response time etc etc"). My best Oracle experts indicated it would cost about $250,000 in licenses and hardware to implement.

Then, a couple of days later, one of my newer developers approached me. He demonstrated his implementation of the use case. I was amazed by how well it worked and how fast it performed. A bit confused, I asked him where he had it running.... he said ... right here on my laptop!

He told me about Elasticsearch, and that’s when I started exploring it.

A couple of months later, we successfully implemented the use case with Elasticsearch, and it only cost about $1,700 a month for hardware since Elasticsearch itself was free and open-source.

I ended up at Elastic bit over a year later.... still haven't looked back.

Enjoy the journey .... we are here to answer questions...

2 Likes

Hi @stephenb,

After checking this out in DEVTOOLS yes this offers a few things I am after. I tried using this syntax in the FORMULA section of Kibana with no joy. I take it this is only really available if you already have an app wherein you want to perform these ESQL queries and format the data yourself. In terms of what is built into the ES + Kibana combination this is not part of that eh?

Hi @geoffrey

If you would like some help, there are basics we should establish.

What version are you on? Hopefully 9.1.x or so.

How did you deploy?

Do you have sample data? Have you indexed any data? Your Own, Sample Data Sets that are included?

What does your sample data look like?

What do you want to accomplish with your data?

Can you give an example of a document and what you would like to do?

FORMULA if I understand what you are referring to, is in Lens visualization, which is not where to use ES|QL, and is not where I would try to do text manipulation. Start with Discover ESQL Editor and go from there...

ESQL

When I am speaking of ES|QL I mean from Kibana - Discover, that is where I would start

This is logs, but will work with any indexed data Timeseries or Not...

Discover ESQL Editor

I would look through this in detail

There are some detailed tutorial here (not really about extraction/text manipulation) but you can get to that next...

Then you can move on to text manipulation using any of these

Or These

GROK and DISSECT are 2 of the primary text extraction Processing Commands

Each of the function have a simple ROW example, so a simple text extraction using GROK could look like this

ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 some.email@foo.com 42"
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}"""
| KEEP date, ip, email, num

Or something sophisticated like below, which processes complex log lines using GROK that are all coming in as a single field and extracts them out into all the separate fields I need to see, and then aggregates or filters them on...

And then aggregate on the fields I extracted

You could do more search / matching / concatonating / substring, whatever you like

So take a look, read a bit, follow the example and bring us some actual use cases and lets see if we can have a little fun... I assure you it is quite powerful and growing...

1 Like

I did not know that, and it turns out nor did a substantial fraction of the current ES team. Such things get lost in the mists of time I guess. Here are your vicariously-collected reacji :grin:

2 Likes

Ha! That was original author Rashid Khan’s little joke.

2 Likes

TIL!!! THAT IS SUPER AWESOME!!!

1 Like

Hi @stephenb,

So my details are for my setup (from web connection to port 9200):

    "number" : "8.19.3",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "1fde05a4d63448377eceb8fd3d51ce16ca3f02a9",
    "build_date" : "2025-08-26T02:35:34.366492370Z",
    "build_snapshot" : false,
    "lucene_version" : "9.12.2",

Should be pretty much the latest stable version. Deployed on linuxmint running in Vmware WS:

I’m wanting to learn predominately. Every now and again I pick a subject area and travel down the rabbit hole. I like the flavour of the red pills. At this stage I am playing with the sample aircraft data and also have loaded some DMARC and TLS email related reports into the system using parsedmarc utility. At this stage have created some reports that the main DMARC companies dont seem to provide in any way on their websites.

Will type up a post, later on today, with other details as I’m told that I can only have one embedded picture per post.

Cool you seem to be up and running...

8.19.3 is great ... 9.1.3 even better :slight_smile:

When you get some documents in (which sounds like you have), share a few and tell us what you want to do, and perhaps we can help. The LLMs are pretty good at writing ESQL :slight_smile:

I think DMARC is key value pairs... there are automatic KV spliiters in ingest pipelines but there is not one yet in ESQL but you can use DISSECT to parse up the record.

Hiya @geoffrey I had a little fun tonight

Note not a DMARC expert, let the :robot: (LLM) help me.

_bulk load some test DMARC You might want to update the time stamps

Kibana -> Dev Tools

DELETE _data_stream/logs-dmarc-default

POST _bulk
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:33.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=reject; rua=mailto:dmarc-reports@example.com"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:34.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=quarantine; adkim=r; aspf=r; rua=mailto:dmarc@example.org; ruf=mailto:forensics@example.org; fo=1"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:35.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=none; rua=mailto:dmarc-monitor@example.net"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:36.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=reject; adkim=s; aspf=s; rua=mailto:reports@mail.example.com"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:37.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=quarantine; pct=50; rua=mailto:dmarc@company.com"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:38.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=reject; rua=mailto:dmarc1@bank.example,mailto:dmarc2@vendor.com"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:39.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=reject; ruf=mailto:forensics@secure.example; fo=0"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:40.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=none; rua=mailto:dmarc@startup.io; ruf=mailto:forensics@startup.io; fo=1"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:41.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=quarantine; pct=25; adkim=r; rua=mailto:dmarc@shop.example"}
{"create": {"_index": "logs-dmarc-default"}}
{"@timestamp": "2025-09-20T04:28:42.971711+00:00", "data_stream": {"type": "logs", "dataset": "dmarc", "namespace": "default"}, "message": "v=DMARC1; p=reject; adkim=s; aspf=s; rua=mailto:dmarc@global.example,mailto:dmarc@securityvendor.com; ruf=mailto:ruf@global.example; fo=1"}

Note, there is no KV processor yet but here is an example...

So here is a DISSECT, then looking at the rest of the string and plucking out each of the fields.

I have another example, and I am sure there are other ways to do it... When KV comes it will be a natural fit.

If you run it as a query in Kibana -> Dev Tools

POST _query?format=txt
{
  "query" : """
FROM logs-dmarc-default
| DISSECT message "v=%{version}; p=%{policy}; %{rest}"
| EVAL 
    // Extract DKIM alignment mode (adkim) if present
    dkim_alignment = CASE(
        LOCATE(rest, "adkim=") > 0, 
        SUBSTRING(REPLACE(rest, ".*adkim=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract SPF alignment mode (aspf) if present
    spf_alignment = CASE(
        LOCATE(rest, "aspf=") > 0, 
        SUBSTRING(REPLACE(rest, ".*aspf=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract percentage (pct) if present
    percentage = CASE(
        LOCATE(rest, "pct=") > 0, 
        SUBSTRING(REPLACE(rest, ".*pct=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract aggregate report URI (rua) if present
    aggregate_reports = CASE(
        LOCATE(rest, "rua=") > 0, 
        SUBSTRING(REPLACE(rest, ".*rua=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract forensic report URI (ruf) if present
    forensic_reports = CASE(
        LOCATE(rest, "ruf=") > 0, 
        SUBSTRING(REPLACE(rest, ".*ruf=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract forensic options (fo) if present
    forensic_options = CASE(
        LOCATE(rest, "fo=") > 0, 
        SUBSTRING(REPLACE(rest, ".*fo=([^;]+).*", "$1"), 0),
        NULL
    )
| DROP rest
| SORT @timestamp DESC
| KEEP @timestamp, message, dkim_alignment, spf_alignment, percentage, aggregate_reports, forensic_reports, forensic_options
| LIMIT 10
    """
}

You should get something like this

       @timestamp       |                                                                 message                                                                 |dkim_alignment | spf_alignment |  percentage   |                     aggregate_reports                     |       forensic_reports        |forensic_options
------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+---------------+---------------+---------------+-----------------------------------------------------------+-------------------------------+----------------
2025-09-20T04:28:42.971Z|v=DMARC1; p=reject; adkim=s; aspf=s; rua=mailto:dmarc@global.example,mailto:dmarc@securityvendor.com; ruf=mailto:ruf@global.example; fo=1|s              |s              |null           |mailto:dmarc@global.example,mailto:dmarc@securityvendor.com|mailto:ruf@global.example      |1               
2025-09-20T04:28:41.971Z|v=DMARC1; p=quarantine; pct=25; adkim=r; rua=mailto:dmarc@shop.example                                                                   |r              |null           |25             |mailto:dmarc@shop.example                                  |null                           |null            
2025-09-20T04:28:40.971Z|v=DMARC1; p=none; rua=mailto:dmarc@startup.io; ruf=mailto:forensics@startup.io; fo=1                                                     |null           |null           |null           |mailto:dmarc@startup.io                                    |mailto:forensics@startup.io    |1               
2025-09-20T04:28:39.971Z|v=DMARC1; p=reject; ruf=mailto:forensics@secure.example; fo=0                                                                            |null           |null           |null           |null                                                       |mailto:forensics@secure.example|0               
2025-09-20T04:28:38.971Z|v=DMARC1; p=reject; rua=mailto:dmarc1@bank.example,mailto:dmarc2@vendor.com                                                              |null           |null           |null           |mailto:dmarc1@bank.example,mailto:dmarc2@vendor.com        |null                           |null            
2025-09-20T04:28:37.971Z|v=DMARC1; p=quarantine; pct=50; rua=mailto:dmarc@company.com                                                                             |null           |null           |50             |mailto:dmarc@company.com                                   |null                           |null            
2025-09-20T04:28:36.971Z|v=DMARC1; p=reject; adkim=s; aspf=s; rua=mailto:reports@mail.example.com                                                                 |s              |s              |null           |mailto:reports@mail.example.com                            |null                           |null            
2025-09-20T04:28:35.971Z|v=DMARC1; p=none; rua=mailto:dmarc-monitor@example.net                                                                                   |null           |null           |null           |mailto:dmarc-monitor@example.net                           |null                           |null            
2025-09-20T04:28:34.971Z|v=DMARC1; p=quarantine; adkim=r; aspf=r; rua=mailto:dmarc@example.org; ruf=mailto:forensics@example.org; fo=1                            |r              |r              |null           |mailto:dmarc@example.org                                   |mailto:forensics@example.org   |1               
2025-09-20T04:28:33.971Z|v=DMARC1; p=reject; rua=mailto:dmarc-reports@example.com                                                                                 |null           |null           |null           |mailto:dmarc-reports@example.com                           |null                           |null            
   

If you run it from discover you will get this... all nice a parsed up.

FROM logs-dmarc-default
| DISSECT message "v=%{version}; p=%{policy}; %{rest}"
| EVAL 
    // Extract DKIM alignment mode (adkim) if present
    dkim_alignment = CASE(
        LOCATE(rest, "adkim=") > 0, 
        SUBSTRING(REPLACE(rest, ".*adkim=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract SPF alignment mode (aspf) if present
    spf_alignment = CASE(
        LOCATE(rest, "aspf=") > 0, 
        SUBSTRING(REPLACE(rest, ".*aspf=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract percentage (pct) if present
    percentage = CASE(
        LOCATE(rest, "pct=") > 0, 
        SUBSTRING(REPLACE(rest, ".*pct=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract aggregate report URI (rua) if present
    aggregate_reports = CASE(
        LOCATE(rest, "rua=") > 0, 
        SUBSTRING(REPLACE(rest, ".*rua=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract forensic report URI (ruf) if present
    forensic_reports = CASE(
        LOCATE(rest, "ruf=") > 0, 
        SUBSTRING(REPLACE(rest, ".*ruf=([^;]+).*", "$1"), 0),
        NULL
    ),
    // Extract forensic options (fo) if present
    forensic_options = CASE(
        LOCATE(rest, "fo=") > 0, 
        SUBSTRING(REPLACE(rest, ".*fo=([^;]+).*", "$1"), 0),
        NULL
    )
| DROP rest
| SORT @timestamp DESC
| KEEP @timestamp, message, dkim_alignment, spf_alignment, percentage, aggregate_reports, forensic_reports, forensic_options
| LIMIT 10

You will get something like this....

Hope this gives you some ideas!!

Have a great Weekend!