Ah, I assume that should be set in the fingerprint filter? And if it's not it does the fingerprint first based on the first field, then based on the second?
Seems like somewhat broken behaviour (i.e. why would I supply multiple sources if I didn't want the operation to occur on all of them)...
UPDATE:
I've just tested the original pipeline with;
concatenate_sources => "true"
added to the filter stanza and I'm seeing the result I expect now.
Since it's required to perform the task as expected I'd suggest getting in touch with the owner of the blog post (https://www.elastic.co/blog/how-to-find-and-remove-duplicate-documents-in-elasticsearch) and getting them to update the code in there, to save anybody else having the same issue in future.