I am using elastic search sink connector in distributed mode(2 instance). With task of 8 and about 20 to 25 topics to be sink’ed to elastic search.
Even when there are no records to sink, the worker java process is showing 100% CPU usage.
The end to end transfer of records is happening properly, but high CPU usage is a concern.
my settings:
connector config
{
“name”: “elasticsearch-sink”,
“config”: {
“connector.class”: “io.confluent.connect.elasticsearch.ElasticsearchSinkConnector”,
“tasks.max”: “8”,
“topics.regex”:"(mytopics_\d+$)",
“key.ignore”: “true”,
“schema.ignore”: “true”,
“connection.url”: “http://eshost:esport”,
“type.name”: “kafka-connect”
}
}
Worker settings:
bootstrap.servers=localhost:9094,localhost:9095
group.id=test-cluster
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
config.storage.topic=connect-configs
config.storage.replication.factor=3
status.storage.topic=connect-status
status.storage.replication.factor=3
status.storage.partitions=8
rest.port=9034
plugin.path=/pluginpath
log4j.rootLogger=DEBUG, stdout
I’m trying it on a server grade setup (64GB RAM and 8 CPU cores) and there is good connectivity to kafka as well as es server.
snapshot of my devsetup:
The usage using top -p shows CPU utilisation constantly above 22 to 25% even though there are no new records to process, this is a parallel test setup here the number of partitions to listen are little less.
Any pointers will help.
Thanks in advance.