Hit a log only all specific keyword exists?

This thread suggests a theoretical limit on the maximum document size in Lucene of 2GB, also the size of a single token seems to be limited to approx. 32kB.
Still, it might be useful to split very large documents in logical subsections (e.g. a book into chapters or pages). What sizes are we talking about?