Mailtracking problem with a self built dashboard

      hey mates,

maybe some of you could help me with a problem i'm acutally trying to solve.

we're using logstash in combination with logstash-forwarder and elasticsearch to store the data.

we're trying to analyze our internal mail tracking. so we've got about
5 different kinds of log types and so we have 5 different queue ids
(each system gives the incoming mail a different id) for one message id
which is equal over the different systems. so e.g. we have 10 entries
with the queue id 1234 but only one entry with the queue id 1234 and the
message id xyz. but we need to combine / match / map every of these
entries with the correct message id over all of these different log

our problem now is to match the data correct to our dashboard.

here are some examples of our logs we're looking for. some kinds of postfix stuff and some self built logs

first step of a mail (postfix server):

Jun 25 09:20:29 progov33 postfix/smtpd[18999]: D920718033: client=pcm20w8.procilon.local[]
Jun 25 09:20:29 progov33 postfix/cleanup[19004]: D920718033:
Jun 25 09:20:29 progov33 postfix/qmgr[3759]: D920718033:, size=28828, nrcpt=1 (queue active)
Jun 25 09:20:30 progov33 postfix/pipe[19005]: D920718033:, relay=julia-g, delay=0.13, delays=0.07/0.01/0/0.06, dsn=2.0.0, status=sent (delivered via julia-g service)
Jun 25 09:20:30 progov33 postfix/qmgr[3759]: D920718033: removed

second step (other server, self built logs)

[[25/06/2015 09:20:29] 8346eaf11444]: Input mail size is 28411 bytes.
[[25/06/2015 09:20:29] 8346eaf11444]: Envelope Sender is:
[[25/06/2015 09:20:29] 8346eaf11444]: Message-ID:
[[25/06/2015 09:20:29] 8346eaf11444]: Recipient
[[25/06/2015 09:20:29] 8346eaf11444]: Command line Recipients (for keysearch, only SMTP-Envelope):
[[25/06/2015 09:20:29] 8346eaf11444]: Recipient
[[25/06/2015 09:20:29] 8346eaf11444]: All email addresses for keysearch (may include sender):
[[25/06/2015 09:20:29] 8346eaf11444]: Recipient
[[25/06/2015 09:20:29] 8346eaf11444]: Connected SMTP client (-client) -
[[25/06/2015 09:20:29] 8346eaf11444]: Host is not allowed to use the mailoffice.
[[25/06/2015 09:20:29] 8346eaf11444]: K-FALL is not active.
[[25/06/2015 09:20:29] 8346eaf11444]: command is empty
[[25/06/2015 09:20:29] 8346eaf11444]: Using SINGLE SMIME engine.
[[25/06/2015 09:20:29] 8346eaf11444]: This message is not in S/MIME format
[[25/06/2015 09:20:29] 8346eaf11444]: This message is not in CMS format
not detected.
[[25/06/2015 09:20:29] 8346eaf11444]: This message is not enveloped PKCS#7, so i can't check the signature.
Encrypted : 0 , S/MIME Signed : 0
[[25/06/2015 09:20:29] 8346eaf11444]: no recipients in list.
[[25/06/2015 09:20:29] 8346eaf11444]: To line is:
[[25/06/2015 09:20:29] 8346eaf11444]: CC line is: undefined
[[25/06/2015 09:20:29] 8346eaf11444]: CC header line not rewritten.
[[25/06/2015 09:20:29] 8346eaf11444]: Now pipe mail to virus scanner (nexthop system).
[[25/06/2015 09:20:29] 8346eaf11444]: Deliver Mail to mailer /opt/julia/bin/sendmail -i with From:
[[25/06/2015 09:20:29] 8346eaf11444]: Pipe Mail to sendmail with From:
09:20:29] 8346eaf11444]: Full Mailer: /opt/julia/bin/sendmail -i
[[25/06/2015 09:20:30] 8346eaf11444]: Mail successfully piped to virus scanner (nexthop system).

third step (back to the postfix server):

Jun 25 09:20:30 progov33 postfix/pickup[17460]: 00B6F18039: uid=1000
Jun 25 09:20:30 progov33 postfix/cleanup[19004]: 00B6F18039:
Jun 25 09:20:30 progov33 postfix/qmgr[3759]: 00B6F18039:, size=28950, nrcpt=1 (queue active)
Jun 25 09:20:30 progov33 postfix/smtp[19009]: 00B6F18039:, relay=[]:10025, delay=0.37, delays=0.04/0.01/0.04/0.28, dsn=2.0.0, status=sent (250 OK)
Jun 25 09:20:30 progov33 postfix/qmgr[3759]: 00B6F18039: removed

forth step (other server, self built logs):

2015-06-25 09:20:30 INFO [SMTPProcessor-1] [SMTPProcessor ] --> Eingehende Verbindung via SMTP von
09:20:30 INFO [SMTPProcessor-1] [SMTPProcessor ]
F36E4567E7684C518CECFCECF08E3738 FROM:
09:20:30 INFO [SMTPProcessor-1] [SMTPProcessor ]
F36E4567E7684C518CECFCECF08E3738 RCPT TO:
09:20:30 INFO [SMTPProcessor-1] [SMTPProcessor ]
F36E4567E7684C518CECFCECF08E3738 SUBJECT: signiertes PDF (Name passt)
2015-06-25 09:20:30 INFO [SMTPProcessor-1] [SMTPProcessor ] F36E4567E7684C518CECFCECF08E3738 MIMEID:
2015-06-25 09:20:30 INFO [SMTPProcessor-1] [SMTPProcessor ] F36E4567E7684C518CECFCECF08E3738 SIZE: 28274
09:20:30 DEBUG [SMTPProcessor-1] [SMTPProcessor ]
F36E4567E7684C518CECFCECF08E3738 Speichere Mail in der Datenbank
09:20:30 DEBUG [SMTPProcessor-1] [SMTPProcessor ]
F36E4567E7684C518CECFCECF08E3738 Nachricht mit id 4202 gespeichert
09:20:30 DEBUG [SMTPProcessor-1] [SMTPMessage ]
F36E4567E7684C518CECFCECF08E3738 Verifiziere Mail in der Datenbank
09:20:30 DEBUG [SMTPProcessor-1] [QueueUtils ]
sendMessageDatasToQueue F36E4567E7684C518CECFCECF08E3738 >>>
2015-06-25 09:20:30 DEBUG [SMTPProcessor-1] [QueueUtils ] <<< sendMessageDatasToQueue

as you can see, there is a unique message id over all these different types, but i need
to visualize all the data matching to this is, so i have to look up for
all the different queue ids D920718033, 8346eaf11444, 00B6F18039,

we're using a self built dashboard which is kinda like kibana.

any ideas? :smile:

do you see a possibility to solve this problem with logstash or with
our requests for elasticsearch? my thought is to search for all message
ids and note the queue id and with a second request print all the
matching queue ids

i would be deeply grateful if some of you guys could show us a solution to solve this problem


What specifically do you want to display here?

You can easily do a search based on the an ID like

What you are trying to do doesn't seem like something that Elasticsearch is particularly good at. The whole concept of foreign keys and joins is doable but it require some deep understanding in Elasticsearch and will probably have a huge performance hit.

Here are some recommendations based on my own experience:

  1. Try to flatted the data - in your case see if you can write the messageID to every log message you produce it will solve your problem and you can continue to use Elasticsearch for this.
  2. Look at Parent Child construct in Elasticsearch - this is your best bet if you want to achieve this. It's not rocket science to set up and configure logstash to send the relevant data but it does have an impact on performance.
  3. Depending on the volume of data there are other tools out there that are more suitable for this type of problem - you can look at columnar databases if you have a very high volume and want to do this on your own or you can look at other log analytics solutions like Splunk if you want to use third party tools.

Let me know if this helps.

-- Asaf.

first of all, thanks for your comments.
yes you're right, we have to try to combine every entry with the message id, acutally we try to solve this on client side by java code, but i think we could get problems with our performance for big data.

with our dashboard we want to show up the flow for every mail, not only if it passed a server, we want to show up which cryptographical activity is goin on. so we have to search for the message id yes,and to show each entry with a matching queue id.