UDP/TCP performance test, UDP drops messages


(Lincoln) #1

I was doing perf test to Logstash's network inputs, UDP/TCP. I wrote a python script to simulate some messages generator from network. The python script and Logstash is located in different machines in LAN. So the setup looks like this:

python socket->tcp/udp->logstash->local file

After some rough tests, I observed that at least in my method of testing, UDP drops lots of messages and TCP works fine with an acceptable performance (10 clients each send 100k msg in 37s).

My testing python script is attached below. I only tried nc -4u but also observed messages dropped. Still investigating if this is caused by Logstash input or the way I send message through UDP.

def udp_worker(port):
    start = time.time()
    sock = socket.socket(socket.AF_INET,
            socket.SOCK_DGRAM)
    print('Begin sendding data to port %d' % port)
    retval = 0
    for i in range(0, MSG_AMOUNT):
        retval += sock.sendto(MESSAGE_BASE % (i, port, randrange(100)), (UDP_IP, port))
    print('Total amount of data sent %d in time %s' % (retval, str(time.time() - start)))

def tcp_worker(port):
    start = time.time()
    sock = socket.socket(socket.AF_INET,
            socket.SOCK_STREAM)
    sock.connect((UDP_IP, UDP_BASE_PORT))
    print('Begin sendding data to port %d' % port)
    for i in range(0, MSG_AMOUNT):
        sock.sendall(MESSAGE_BASE % (i, port, randrange(100)) )
        #data = sock.recv(1024)
    sock.close()
    print('Total in time %s' % str(time.time() - start))

if __name__ == '__main__':
    import sys
    if len(sys.argv) > 1:
        MSG_AMOUNT = int(sys.argv[1])
    if len(sys.argv) > 2:
        PROCESS = int(sys.argv[2])
    workers = deque()
    for i in range(0, PROCESS):
        port = UDP_BASE_PORT
        #t = threading.Thread(target = udp_worker, args = [port])   # comments this for testing tcp only
        t = threading.Thread(target = tcp_worker, args = [port])
        t.start()
        print("%s start" % t)
        workers.append(t)
    for w in workers:
        print("%s wait for join" % w)
        w.join()

I appreciate if some one can provide a stable environment transferring data through UDP or point out anything I might miss during my test!

Btw, I didn't see things related to UDP input in 1.5.0 change log.

EDIT: forgot to post my Logstash settings

input {
  udp/tcp {
    port => xxxx
  }
}
output {
   file {
     path => "/tmp/tmpfile"
   }
}

Performance Issues with Logstash UDP/IPFIX
(Aaron Mildenstein) #2

Testing Logstash UDP with my 2-core MacBook Air, I am able to keep up with 45,000 events per second.

If you must use UDP, you can try to tweak Logstash to get as much from it as possible by altering these settings:

  • workers — The number of worker threads. On my 2 core MBA, it made no difference above the 2 worker default because I only have 2 cores. If your Logstash has more, you may be able to increase this value to speed up ingest.
  • queue_size —This is the number of unprocessed UDP packets you can hold in memory before packets will start dropping. The default is 2000. Increasing this may allow you to capture more than you have been able.

(Lincoln) #3

Hey Aaron,

Thanks for the advice! I did some follow up testing by tweak UDP to 5 workers and queue_size to 100,000. After some testing with my script, by sending 100,000 messages with 1 thread. The total amount of messages Logstash received is ~92k, not 100%.

Could you please provide your configuration of UDP input? Thanks a lot!


(Aaron Mildenstein) #4

100,000 is awfully big. Even 20,000 was a huge number, since Logstash is ingesting at a very fast rate, not just spooling into that queue. How long did you wait after you shut down your feeder? When I tried a queue_size of 40,000, it took several seconds to empty out completely.

input {
  udp {
    workers => 3
    queue_size => 15000
  }
}

I was able to capture 45,000 events per second with this, and bursts up to 52,500.


(Lincoln) #5

Hmm, yeah, 100k seems too weird honestly. And the fact I observed is that the file stop growing almost right after the feeder finished.

I changed my queue_size to 15k and still 5 workers. Same feeder doing a loop continuously shooting messages to Logstash sent 100k events in 5~6s. After several tests the best I have is still 90k and sometimes even only ~50k. Happened so random...

I'll try to slow down the total events send per second from my feeder to see if this gonna make some changes.


(system) #6