Clumpy compression
--clumpify is an opt-in mode that reorders reads inside each gzip member of the trimmed output so reads sharing a canonical 16-mer minimizer land adjacent on disk. gzip's 32 KB sliding window then finds long redundant runs of similar sequences, shrinking the .fq.gz by 15–55% depending on the data type, with no information loss — only the on-disk order of records changes.
--compression <N> sets the gzip level independently (1–9, default 1). Combine the two for maximum effect: --clumpify --compression 9 reorders reads and runs gzip at its slowest/smallest level.
When to use it
Section titled “When to use it”Clumpify
Section titled “Clumpify”- ✅ Low complexity data: yes (ATAC-seq, ChIP-seq, Ribo-Seq, RNA-Seq, RRBS, high sequencing depth WES)
- ❌ High complexity data: no (whole-genome sequencing, WGBS) - may have detrimental impact (flowcell ordering wins)
- ❌ Long reads: no (Oxford Nanopore) - has no effect
- ❌ Unusual paired-end formats: no - can have deleterious effect
Compression
Section titled “Compression”Whether to push up --compression level or not depends on what the trimmed FASTQ is used for:
- Pipeline intermediates (trimmed FASTQ is ephemeral, deleted after the pipeline finishes)
- Leave as compression level 1, but can still use
--clumpify. - The reorder is essentially free (1.0–1.4× slowdown on most data) and the smaller output makes the next step (typically an aligner) read less from disk — net I/O win for the whole pipeline.
- Leave as compression level 1, but can still use
- Long-term storage or disk-constrained workdirs
- Add
--compression 6(or--compression 9for archival) --clumpify --compression 6can halve output file sizes (15–50% less) but makes the run time 4–6× slower.- Can be specified without
--clumpify, but with redundant data types (ATAC-seq, Ribo-seq) it typically runs faster than with clumpify on, because deflate finds matches more cheaply on sorted runs.
- Add
Data types
Section titled “Data types”| Data type | Typical saving (--clumpify --compression 9) | Recommendation |
|---|---|---|
| ATAC-seq (paired) | ~50% | ✅ Strong yes: Tn5 insertion bias creates very high fragment redundancy |
| Ribo-seq (paired) | ~45% | ✅ Strong yes: short ribosome-protected fragments are highly clustered |
| MiSeq amplicon / CRISPR / sgRNA | 30–37% | ✅ Strong yes: explicit amplification produces lots of duplicates |
| RRBS (paired) | ~24% at default --memory 1G; up to ~31% at --memory 4G+ | ✅ Yes: MspI cut sites concentrate reads at fragment ends; minimizer co-clusters them. Bigger --memory budget gives substantial extra saving — atypical for paired-end data (most types saturate at default memory). |
| WGBS (paired) | +9% — but plain --compression 6 alone gets +19% | ❌ No: coverage-diverse reads, no fragment-level clustering. R2 disruption beats the R1 win at every gzip level — same mechanism as 10x scRNA-seq. Use --compression 6 without --clumpify for ~19% saving |
| ChIP-seq (single-end) | ~24% | ✅ Yes: peaks generate clustered reads |
| RNA-seq (paired) | 16–30% | ✅ Yes: highly-expressed transcripts create dense clusters; bigger savings at higher gzip levels |
| WES / WGS (paired) | 6–22% | 🟡 Modest: diverse coverage gives less clustering |
| scRNA-seq (10x Chromium) | negative — output grows | ❌ No: R1 (cell barcode + UMI) reorders cleanly, but R2 (cDNA) follows R1's order to preserve pair lockstep and ends up scrambled vs the natural flowcell-cluster order. R2 disruption beats the R1 win. Use --compression 6 without --clumpify for ~17% saving |
| Long-read (ONT, PacBio) | ~0% | ❌ No: long reads are mostly unique fragments; clumpify doesn't help and adds wall time |
| Variable-length / mixed amplicon | ~0% | ❌ Skip: diversity defeats minimizer clustering |
How to use it
Section titled “How to use it”# Reorder reads, default gzip level (1 — fastest)trim_galore --clumpify <input>
# Maximum compression (slowest, smallest output)trim_galore --clumpify --compression 9 <input>
# Compose for archival storage: max compression with extra memorytrim_galore --clumpify --compression 9 --memory 4G <input>
# Higher gzip without reordering (gzip-only win, no clumping cost)trim_galore --compression 6 <input>--clumpify requires --cores >= 2 (it feeds the existing parallel worker pool with binned batches) and gzip output (--dont_gzip is rejected).
--compression is independent: it works with or without --clumpify, and applies to the regular trimming pipeline too.
Performance and compression considerations
Section titled “Performance and compression considerations”Wall-time cost
Section titled “Wall-time cost”Reordering itself is essentially free; the dominant cost at higher gzip levels is gzip's CPU time. Decoupling the two flags lets you pick the trade-off you actually want:
| Mode | Wall time vs plain |
|---|---|
--clumpify (compression 1, default) | ~1.0–1.4× plain (essentially free) |
--clumpify --compression 6 | ~1.5–6.4× plain |
--clumpify --compression 9 | ~5–10× plain |
--compression 6 (no clumpify) | ~1.6–6.4× plain |
--compression 9 (no clumpify) | ~5–8× plain |
The minimizer computation uses 2-bit packed integer ops (one O(1) bitwise step per read position) and the per-bin sort is O(n log n) on small bins; both run in the parallel worker pool alongside trim+filter so they overlap with I/O.
Memory
Section titled “Memory”--memory (default 1G) is a Trim Galore-wide memory budget.
The clumpify dispatcher sizes the per-bin sort runs against it, with the formula picked in an attempt to get the predicted peak RSS ≤ --memory:
So with --cores 8 --memory 1G you get 32 bins × 12 MB; with --cores 8 --memory 4G you get 32 bins × 66 MB. The binary prints those resolved values at startup.
In theory, bigger budget → bigger per-gzip-member sort runs → better compression. In practice, increasing memory doesn't seem to make very much difference in our tests.
Below-floor behaviour
Section titled “Below-floor behaviour”--clumpify needs a minimum of ~535 MiB to run (mostly from a fixed 512 MiB reservation for FastQC, allocator, and runtime overhead). The exact floor varies slightly with --cores but stays in the 535–730 MiB range for any sensible core count.
If --memory is below the floor, Trim Galore prints a warning and falls back to plain mode:
WARNING: --memory 100M is too small for --clumpify at --cores 6 (need ≥ 552 MiB). Falling back to plain mode (no read reordering). Increase --memory or drop --clumpify to silence this warning.The trim itself proceeds normally; only the read-reordering step is skipped.
Memory usage without --clumpify is typically significantly lower, around the 100MB mark.
What doesn't change
Section titled “What doesn't change”- Trimming algorithm and per-record output bytes — clumpify only changes the order of records.
- All
*_trimming_report.txt/.jsonnumbers are byte-identical between plain and clumpify runs (filter + stats code is order-independent). - Multi-member gzip is RFC 1952 valid;
zcat,seqkit,samtools fastq, andMultiGzDecoderall handle it transparently. - Pair lockstep is preserved: R1[i] and R2[i] are still mates after clumpify reorders them.
Downstream BAM compression
Section titled “Downstream BAM compression”The read-clustering effect carries through into downstream unsorted BAM files at essentially full strength.
Because aligners typically stream output reads out in the same order that they come in with, and BAM files useg gzip compression internally, the same clumping improvements hold true through alignment.
Here's an example using ATAC-seq data (31 M paired-end reads), aligning to a minimal index (chr22 only):
| BAM stage | Saving (clumpify vs plain) |
|---|---|
| Trimmed FASTQ (gzip level 1) | −34.8% |
samtools import → uBAM (no alignment) | −36.4% |
| STAR 2.7.11b chr22 alignment → unsorted, aligned BAM | −34.2% |
Note that only unsorted BAMs benefit. If your pipeline coordinate-sorts the BAM immediately after alignment (e.g. samtools sort or STAR --outSAMtype BAM SortedByCoordinate), the read order is rearranged by genomic position and the input-order signal is erased. A coordinate-sorted BAM's size is determined by genomic distribution of reads, not by clumpify's clustering.
Benchmark results
Section titled “Benchmark results”Real-world numbers from a benchmark using a MacBook Pro (Apple Silicon, 16 GiB RAM, --cores 6 --memory 1G, all defaults). Each dataset has 3 bars:
--clumpify(L1, default)--compression 6(no clumpify)--clumpify --compression 6.
Plots show compression savings (how much smaller the resulting FastQ files are versus the regular run) and the Wall-time effect (how much slower the run was, 1x is original run).
Datasets covered:
- MiSeq amplicon (CRISPR): 4.4M SE, 500 MB plain output —
ERR16944282 - ChIP-seq (Illumina SE): 28.6M SE, 1.5 GB —
SRX747791 - WES (Illumina SE): 105M SE, 9.2 GB —
SRR7890918 - Long-read (ONT): 100K SE, 558 MB —
SRR37915503 - ATAC-seq (Illumina PE): 31.5M PE, 2.9 GB —
SRX2717909 - Ribo-seq (Illumina PE): ~30M PE, 4.0 GB —
SRX11780879(SRR15480782) - RNA-seq (Illumina PE): 93M PE, 17.0 GB —
SRX1603629 - scRNA-seq 10x Chromium (PE): 392M PE, 39.6 GB —
pbmc8k_v2(10x Genomics public dataset)
Comparison with other tools
Section titled “Comparison with other tools”If you've used bbmap clumpify or stevekm/squish, --clumpify produces compression results in the same ballpark on most data:
- On amplicon-type data, all three tools converge to ~37–38% saving at gzip level 9.
- On diverse data (WES, WGS), all three give ~20–30% saving — none of them works miracles on inherently diverse libraries.
The advantage of --clumpify over running a separate tool is that it's part of the trim pass, so there's no extra read-and-rewrite cycle. For an X GB input, a separate clumpify step would mean reading the trimmed output, sorting, and writing it back; typically +3–5× the trim wall time as well as double the disk I/O. --clumpify does it in one pass.