How many workers to parallel bulk indexing in cluster?

I have a cluster with 8 nodes. I want to index a lot of docs. For the best performance,
how many workers (to parallel bulk indexing) must run on each node?

It depends on a number of factors, e.g. document size, number of shards indexed into, mappings, hardware and bulk size. I would recommend running a benchmark with as realistic data and settings as possible.

If N is the best number, I must send N parallel bulk to one node or split them between all nodes?

It will be the optimum for the way you tested it. I would recommend sending bulk requests to all data nodes that do indexing though.