corpus-100-* are for general-purpose mixed tests corpus-1-* are single-item corpori, for running exact-length variations (made by hand with a clipboard) corpusx-* are purposed for getting started with the "normalize" test, using randomness to have rare -s, yet awkwardly avoiding SSO with min-length 16 x means 1% -s IIRC, ./make-corpus 100 20 DROVE corpus-100-20-1.txt ergo, ./make-corpus 100 5 15 0.01 IS FOR corpusx-100-20-1.txt ./make-corpus 100 5 15 0.01 > corpusx-100-20-1.txt