Unable to split the data in message field

Hello
I am running a python file using exec input plugin, the python file is making multiple api calls and collecting the data in a list(array).

I am printing the list and getting the data in message field, but I am unable to split that data into individual documents

Can someone please help me to split the data into individual documents or suggest a better approach to handle the data that I have collected using python file.

(attaching the screenshot of the logstash file and the data I am getting in message field)


Can you share a sample of your message in plain text so someone can try to replicate your issue?

Also, what you want to use is the split filter, this filter splits an array and each item on that array will become an individual document, you are using the mutation split from the mutate filter, this is different, this transform a string into an array in the same document.

Try the following:

filter {
    split {
        field => "message"
    }
}

Hi @leandrojmp
Thanks for your reply

message field that I am getting is -

"message" : """[[{'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|all_metrics'}, 'data': [9.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_warning'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|readyAvg'}, 'data': [0.05941666662693024]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstCpuqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|packetLossTransmit'}, 'data': [0.0]}, {'timestamps': [1675360680976], 'statKey': {'key': 'connectedTime'}, 'data': [40.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'summary|connectedSessions'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|recommendedCpu'}, 'data': [21.498905181884766]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|costopAvg'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|risk'}, 'data': [-1.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|usageAvg'}, 'data': [1.7986667156219482]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|packetLossReceive'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstPeakvcpuReady'}, 'data': [0.08433333039283752]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|recommendedMemory'}, 'data': [53.072486877441406]}, {'timestamps': [1675360550668], 'statKey': {'key': 'performance|worstDatacenter'}, 'data': [99.01024627685547]}, {'timestamps': [1675360550668], 'statKey': {'key': 'performance|worstKPI'}, 'data': [98.17179107666016]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|total_alert_count'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|overSized|memory'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_info'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|compliance'}, 'data': [-1.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|worstvDiskOIO'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|health'}, 'data': [-1.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|health'}, 'data': [-1.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|isUnderSized'}, 'values': ['true']}, {'timestamps': [1675353487754], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|overSized|vCpus'}, 'data': [4.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_immediate'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|isOverSized'}, 'values': ['false']}, {'timestamps': [1675360680976], 'statKey': {'key': 'loginTime'}, 'data': [1675352211456.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|diskIops'}, 'data': [10.666666984558105]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|child_all_metrics'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|diskThroughput'}, 'data': [0.11712239682674408]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstOverlapSummation'}, 'data': [12.666666984558105]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|iowaitAvg'}, 'data': [0.6491249799728394]}, {'timestamps': [1675353487754], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|underSized|vCpus'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|cpuqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|pageInRatePerSecondAvg'}, 'data': [2.4666666984558105]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|worstFreeMemoryAvg'}, 'data': [39166.77734375]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstLatency'}, 'data': [48.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|recommendedVcpu'}, 'data': [8.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'summary|noOfSessions'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|freeMemoryAvg'}, 'data': [39166.77734375]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstPeakvcpuUsage'}, 'data': [2.584552526473999]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|worstPageInRatePerSecondAvg'}, 'data': [2.4666666984558105]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|diskqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|underSized|memory'}, 'data': [5.0745110511779785]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|peakvDiskReadLatency'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstPacketLossTransmit'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstCostopAvg'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|peakvDiskWriteLatency'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'performance|worstNetwork'}, 'data': [97.33333587646484]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|worstContentionAvg'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|frameRate'}, 'data': [6.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|connectedTime'}, 'data': [40.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|totalLatencyAvg'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_critical'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstFrameRate'}, 'data': [6.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|contentionAvg'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|efficiency'}, 'data': [-1.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|total_alarms'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|latency'}, 'data': [48.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|self_alert_count'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|worstDiskqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675360680976], 'statKey': {'key': 'idleDuration'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|availability'}, 'data': [-1.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstPacketLossReceive'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstIOwaitAvg'}, 'data': [0.6491249799728394]}, {'timestamps': [1676630760990], 'statKey': {'key': 'summary|disconnectedSessions'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|pageOutRatePerSecondAvg'}, 'data': [0.0]}], [{'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|all_metrics'}, 'data': [9.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_warning'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|readyAvg'}, 'data': [0.05941666662693024]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|idleDuration'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstCpuqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|packetLossTransmit'}, 'data': [0.0]}, {'timestamps': [1675360680976], 'statKey': {'key': 'connectedTime'}, 'data': [40.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'summary|connectedSessions'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|recommendedCpu'}, 'data': [21.498905181884766]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|costopAvg'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|risk'}, 'data': [-1.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|usageAvg'}, 'data': [1.7986667156219482]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|packetLossReceive'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstPeakvcpuReady'}, 'data': [0.08433333039283752]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|recommendedMemory'}, 'data': [53.072486877441406]}, {'timestamps': [1675360550668], 'statKey': {'key': 'performance|worstDatacenter'}, 'data': [99.01024627685547]}, {'timestamps': [1675360550668], 'statKey': {'key': 'performance|worstKPI'}, 'data': [98.17179107666016]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|total_alert_count'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|overSized|memory'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_info'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|compliance'}, 'data': [-1.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|worstvDiskOIO'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|health'}, 'data': [-1.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|health'}, 'data': [-1.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|isUnderSized'}, 'values': ['true']}, {'timestamps': [1675353487754], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|overSized|vCpus'}, 'data': [4.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_immediate'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|isOverSized'}, 'values': ['false']}, {'timestamps': [1675360680976], 'statKey': {'key': 'loginTime'}, 'data': [1675352211456.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|diskIops'}, 'data': [10.666666984558105]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|child_all_metrics'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|diskThroughput'}, 'data': [0.11712239682674408]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstOverlapSummation'}, 'data': [12.666666984558105]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|iowaitAvg'}, 'data': [0.6491249799728394]}, {'timestamps': [1675353487754], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|underSized|vCpus'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|cpuqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|pageInRatePerSecondAvg'}, 'data': [2.4666666984558105]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|worstFreeMemoryAvg'}, 'data': [39166.77734375]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstLatency'}, 'data': [48.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|recommendedVcpu'}, 'data': [8.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'summary|noOfSessions'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|freeMemoryAvg'}, 'data': [39166.77734375]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstPeakvcpuUsage'}, 'data': [2.584552526473999]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|worstPageInRatePerSecondAvg'}, 'data': [2.4666666984558105]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|diskqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|underSized|memory'}, 'data': [5.0745110511779785]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|peakvDiskReadLatency'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstPacketLossTransmit'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstCostopAvg'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|peakvDiskWriteLatency'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'performance|worstNetwork'}, 'data': [97.33333587646484]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|worstContentionAvg'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|frameRate'}, 'data': [6.0]}, {'timestamps': [1675360680979], 'statKey': {'key': 'vdiDesktop:Pool-ge-pant3az2|connectedTime'}, 'data': [40.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|totalLatencyAvg'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|alert_count_critical'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstFrameRate'}, 'data': [6.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|contentionAvg'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'badge|efficiency'}, 'data': [-1.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|total_alarms'}, 'data': [0.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'Protocol|latency'}, 'data': [48.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|self_alert_count'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|disk|worstDiskqueueAvg'}, 'data': [0.0]}, {'timestamps': [1675360680976], 'statKey': {'key': 'idleDuration'}, 'data': [0.0]}, {'timestamps': [1676630760990], 'statKey': {'key': 'System Attributes|availability'}, 'data': [-1.0]}, {'timestamps': [1675353487749], 'statKey': {'key': 'protocol|worstPacketLossReceive'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|cpu|worstIOwaitAvg'}, 'data': [0.6491249799728394]}, {'timestamps': [1676630760990], 'statKey': {'key': 'summary|disconnectedSessions'}, 'data': [0.0]}, {'timestamps': [1675360550668], 'statKey': {'key': 'vdiDesktop|memory|pageOutRatePerSecondAvg'}, 'data': [0.0]}]]
"""

which is data of two virtual machine
basically the message field contains array of array and each element in the inner array represents a vm
I want to make them individual document

Please format your message using the Preformatted text button, the </> button.

it is very hard to understand without this formatting.

Also, share the original message before Logstash, what you shared seems to be from a Logstash output.

You need to share the result of your python command.

The python output is -
[[{data of a vm in key value pairs}], [{data of a vm in key value pairs}], ...]

I am getting this in message field as -

"message" : """[[{data of a vm in key value pairs}], [{data of a vm in key value pairs}, ...]"""

what I want to do is to make data of each vm an individual document

I tried to replicate here, but your source message is pretty bad to parse.

Can you make changes to the output of your python script?

It doesn't seem that Logstash is intepreting the output as an array, you have a string that starts with [[ and end with ]], and you can't convert this string into an array because the item separator is also the field separator.

Hi @leandrojmp
Is there any other way to send data from python file to the Elasticsearch ?

There are many ways, but it is not possible to know without seeing your code, also, you are not sending it to Elasticsearch, you are outputing the result to Logstash, this is different.

For example, if you change the output to emit one document per line instead of putting this into array inside another array, you wouldn't need to split anything.

Hello @leandrojmp
Sorry for the delayed reply !

My python code is just making multiple api calls, collecting the data and storing it in a list.

I am running this python code via logstash as shown below -

input {
	exec {
		command => "python C:\Users\AGUPTA71\Desktop\python-3.10.5-embed-amd64\vdi_user_metric.py"
		interval => 60
		codec => "json"
  	}
}

Now what I want is to send that list of data to the Elasticsearch using the output section of logstash.

Can you please help me achieve this !

As I said on a previous answer, the way your message is received by Logstash is not ideal, it is a big string where you have arrays inside of arrays, this will be pretty hard, if not impossible, to parse correctly in logstash.

You should check if you can change the way your python code generates this message to logstash.

Is there a reason you are trying to get all this info in the same document in elasticsearch?

Based on your question, I assume it is not.

Based on your question, I found this (This is input which has been requested already twice by @leandrojmp and we would expect from you):
https://docs.vmware.com/en/vRealize-Operations/8.10/com.vmware.vcom.api.doc/GUID-8BA80EB6-8AA8-4630-9833-9D342F2F00DF.html

GET https://vrealize.example.com/suite-api/api/adapterkinds/VMWARE/resourcekinds/VirtualMachine/statkeys

So the python is not doing more than some simple http calls.

Maybe you should take a different approach: https://www.elastic.co/guide/en/logstash/current/input-plugins.html

  • Python script writes the separate "events" one by one to a file which is read by filebeat (no longer one big array) - beats plugin
  • Push the events one by one to a http plugin

If you insist in using one big array, you could take a look at the filters:
https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
Some of the interesting plugins might be:

  • split
  • kv
  • ruby
  • json
  • grok (I don't think this is the way to go)

Good luck!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.