Embedded JSON written by realtime logger

Very new to ElasticSearch - working on a first demo.

I have a simple question: my logger is emitting log entries with embedded Json and want to be able to index these embedded portions the same as the whole. How do I do that?

For example, the logger writes:

{Stamp:"2014-12-12", Message:"This is a message that contains json: ""{FieldOne:"Data", FieldTwo:"Data"}""}

(This is from memory - not actual, but it illustrates the concept.)

And I want to query FieldOne in the same way as the rest of the entry. When I look at the entry in the database I see the embedded json is surrounded by quotes and is just treated as a string. Not as actual json, so I can't search on the contents of FieldOne.

Can this be solved by configuration or must I break my logging into different pieces?
OR - can logstash help with this? I plan on doing real time logging and will not be reading an existing log store so I don't need logstash - is that correct?

You can use the elasticsearch HTTP API to insert documents, this is what logstash do (+ filtering +many input plugins), but the idea is the same.

Correct. I used Elasticsearch.Net and Nest to create test data and have also found a plugin for NLog that routes logging to Elasticsearch so I have no problem getting data into the database.

If logstash is only for importing data then I probably don't need that to do realtime logging.

What about the issue with embedded (not nested) Json? Is there a configuration or mapping that would correctly index the contained Json data?

If logstash is only for importing data then I probably don't need that to do realtime logging.

Logstash can collect data from one or more sources, transform the data, and route it to one or more destinations. In some cases you can do without it, but I'd think twice about implementing a logging architecture that relies on direct access to the underlying storage.

What about the issue with embedded (not nested) Json? Is there a configuration or mapping that would correctly index the contained Json data?

I don't believe so, but it's trivially done with Logstash.

Plus, if you have your application write its messages to a local file that's read by Logstash or some other shipper you'll be less dependent on ES being up. If sending directly from NLog, what happens when ES is unavailable?

Very good points! Since I'm in demo mode and constitutionally lazy will you please give me a high level overview of where I should look in the logstash documentation for settings that will change the way the data is indexed? (Would I change the shape of the data or can I influence how it is interpreted?)

I'm not really sure what you're asking, but see the list of Logstash filter plugins for an idea of what you can do. Even for already structured events I use filters to

  • restructure events if I don't like their original layout,
  • perform reverse DNS lookups of IP addresses,
  • convert the type of fields (so that those containing integers, like source code line numbers, are actually integers when sent to ES),
  • remove fields I have no reason to store,

and so on.

My basic confusion is: can I define a filter (or use an existing one) that says to Elasticsearch, "interpret this embedded json string as json" or will I need to employ a filter that says, "extract the json strings and store them in Elasticsearch as a nested part of their associated message document?"

With this data:
{
"_index": "default",
"_type": "logevent",
"_id": "AVMVBZvEYZuT3kcFga6x",
"_version": 3,
"found": true,
"_source": {
"message": "2016-02-24 14:43:18.4690|DEBUG|frmMain|Second Person is {"Id":2,"FirstName":"George","MiddleName":"Walker","LastName":"Bush","BirthDate":"1946-07-06T00:00:00+00:00","DeathDate":null,"Age":70}."
}
}

How can I search for "MiddleName": "Walker" ?

My basic confusion is: can I define a filter (or use an existing one) that says to Elasticsearch, "interpret this embedded json string as json"

I've never heard of such a feature or plugin. Maybe the event ingestion stuff that's being added in the next ES release will support it, though.

So, If I want to include serialized objects in my logging I need to figure out a way to nest them inside my message structure? Wow, thought this would be simple. Oh well - would it be possible to write a logstash filter that pulls embedded json strings during intake and rewrites them in a nested way? That is the point of logstash filtering, isn't it?

I'm not sure I follow your description completely, but looking at the previously posted example it would indeed be trivial to deserialize that piece of JSON into discrete fields in the message. You'd typically use a csv filter to split the |-separated fields, and then you'd apply a json filter to the last part.

And this could be done by writing a logstash filter?

Yes. Well, configuring existing filter plugins.

filter {
  csv {
    ...
  }
  json {
    ...
  }
}

Thanks for your patience. It is beginning to make sense. I can use the filtering capabilities to break the message into pieces. I will look for a way to parse out the embedded json into separate fields. That should do what I want.