Skip to content

Adaptor Sequence Illumina: A Guide for Trimming & Design

Woolf Software

You’ve got the FASTQs back, the run finished overnight, and the first quality plots already show overrepresented sequence at the 3′ end. That usually means one thing: your insert was shorter than your read, and the sequencer kept going into the adapter.

At that point, adaptor sequence illumina details stop being a reference-sheet problem and become an analysis problem. If you trim the wrong sequence, trim in the wrong orientation, or ignore the library structure entirely, you can distort alignments, inflate soft clipping, mis-handle low-complexity libraries, and make downstream QC harder than it needs to be.

For routine whole-genome work, that may just mean noisier alignments. For amplicons, CRISPR validation, construct verification, targeted panels, or synthetic biology libraries, it can directly affect whether you trust the sequence you designed. Adapter handling sits right at the boundary between wet-lab prep and computational interpretation. That’s why it deserves more than a pasted list of oligos.

Why Illumina Adapter Sequences Matter

Illumina made adapter sequences a formal standard reference in October 2015 with the Illumina Adapter Sequences Document (Document #1000000002694 v00), which gave labs an authoritative source for the exact oligos used across kits such as TruSeq and Nextera (Illumina Adapter Sequences Document). That release mattered because adapter trimming only works when the software is searching for the sequence present in the library.

The practical trigger is simple. When read lengths exceed insert sizes, reads run through the insert and into adapter sequence. Illumina platforms such as NovaSeq can produce paired-end reads up to 300 bp from the documented platform context in that same reference, so this is not an edge case for short-insert libraries or aggressively sized fragments.

What goes wrong when adapters remain

Untrimmed adapters don’t just sit harmlessly at the tail of the read.

  • Aligners see non-genomic sequence and either soft-clip it or force poorer placements.
  • Variant calling gets noisier because mismatched read ends can create artifactual evidence.
  • Quantification pipelines inherit junk sequence that never belonged to the biological insert.
  • Assembly and construct validation become more fragile when read ends carry synthetic sequence rather than template-derived sequence.

For synthetic biology work, computational sloppiness can leak into experimental interpretation. A bad trim can look like a failed build, a strange junction, or a low-frequency edit that doesn’t exist.

Practical rule: If the read architecture isn’t clear before trimming, stop and reconstruct the library design first. Sequence cleanup should follow library structure, not guesswork.

Why this matters beyond basic QC

Many teams treat adapter trimming as a checkbox in a pipeline. That’s too casual. Adapter sequence choice affects demultiplexing assumptions, trimming sensitivity, and what you consider a usable read after cleanup. It also changes how you interpret poor runs. High adapter content may reflect short inserts, a prep issue, or a library class where read-through is expected.

That’s why the useful question isn’t “Do I have Illumina adapters?” It’s “Which exact adapters, in which orientation, in which read, from which kit, and what does their abundance say about the library?”

The Anatomy of an Illumina Adapter

An Illumina adapter isn’t one monolithic tag. It’s a functional assembly of sequence elements that let a DNA fragment bind the flow cell, accept sequencing primers, and carry sample identity through indexing.

A diagram illustrating the components of an Illumina adapter including P5/P7 sequences, index sequences, and binding sites.

The structural pieces that matter

Illumina adapter architectures include P5 and P7 sequences for flow cell binding, Read 1 and Read 2 sequencing primer regions, and index sequences for multiplexing. These are the pieces you need to understand before you can troubleshoot anything intelligently.

Here’s the working mental model:

ComponentWhat it doesWhy you care in practice
P5 / P7Bind library molecules to complementary oligos on the flow cellIf these are wrong or incompatible, cluster generation fails or behaves poorly
Read 1 / Read 2 primer binding sitesProvide annealing sites for sequencing primersThese define where sequence acquisition begins and how read orientation is interpreted
i7 and i5 indexesEncode sample identity for pooled librariesWrong index handling causes demultiplexing loss, cross-sample confusion, or both
Adapter junctionsConnect the library insert to sequencing-compatible structureThese are the exact regions trimming tools look for in read-through events

P5 and P7 are not trimming targets in the abstract

People often talk about “the adapter” as if there’s one universal sequence to remove. In reality, trimming usually targets the sequence you encounter after reading through the insert into the adapter-derived region. That read-through sequence is related to the broader adapter design, but the trimming target depends on kit architecture and read orientation.

So while P5 and P7 are central to the molecule, they are not always the thing you type into Cutadapt. The exact trimming sequence has to match the sequence expected at the read end.

Indexes are operational, not decorative

Indexes are often treated like metadata. They are not. They are physical sequence components embedded in the library. Illumina’s documented adapter scheme includes Index 1 (i7) and Index 2 (i5) components, and for many workflows that distinction matters during both demultiplexing and troubleshooting.

A practical consequence is that adapter identity and index identity can’t be separated cleanly in real runs. If a sample sheet is wrong, you may not just lose demultiplexing accuracy. You may also misinterpret what sequence should appear where.

A lot of adapter troubleshooting is really library-structure troubleshooting wearing a bioinformatics label.

The annealing detail people forget

In the formal adapter documentation, Illumina notes that for certain indexed adapter components, only the last 12 nucleotides are complementary for annealing. That detail helps explain both why the forked adapter structure works and why dimers are such a persistent wet-lab problem in low-input prep. Those short complementary regions are enough to create the intended ligation-ready structure. They’re also enough to create unwanted adapter-adapter products when the insert population is weak.

That design choice is elegant when the library is healthy. It becomes expensive when the library isn’t.

Quick Reference for Common Adapter Sequences

When someone asks for adaptor sequence illumina information, they usually need the trimming sequence immediately, not a lecture. This is the fast lookup.

The safest habit is still to confirm the exact kit and document version before production analysis. But for day-to-day trimming, a short reference table covers most of what comes up in standard workflows.

Common Illumina adapter trimming sequences

Kit FamilyUniversal Adapter Sequence to Trim
TruSeqAGATCGGAAGAGCACACGTCTGAACTCCAGTCA
NexteraAGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Illumina DNA Prep / Illumina RNA Prep / PCR kitsAGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Nextera Mate PairCTGTCTCTTATACACATCT+AGATGTGTATAAGAGACAG

How to use this table correctly

The important point is not just the string. It’s the library context.

  • For many standard Illumina kits, the universal trimming sequence is the same.
  • For mate-pair libraries, the composite sequence matters because each adapter half may need to be assessed independently.
  • For indexed libraries, don’t confuse index sequence handling with generic adapter trimming. They are related but not interchangeable.

A few lookup habits that prevent mistakes

  • Check the kit family first. “Illumina library” isn’t specific enough.
  • Confirm whether the library is paired-end or single-end before choosing tool settings.
  • Look at overrepresented sequences in raw QC and compare them to expected adapter-derived sequence.
  • Don’t recycle an old adapters.fa file blindly across unrelated kits.

If your run comes from a shared core, ask for the exact prep kit name, not just “TruSeq-like” or “Nextera-based.” Those approximations create trimming errors faster than anticipated.

Understanding Major Illumina Adapter Variants

Not all Illumina-compatible adapters solve the same problem. Some are built around ligation. Some come from transposase-based fragmentation and tagging. Some are mostly differentiated by indexing strategy, which sounds minor until a large pooled run starts leaking reads across samples.

TruSeq and ligation-based workflows

TruSeq-style libraries are the classic ligation-based model. You fragment, repair ends, add A-tails where appropriate, and ligate adapters. That architecture is predictable, which is why it’s still easy to reason about during troubleshooting.

The upside is clarity. You usually know what the insert boundaries are supposed to be, and adapter interpretation in reads is relatively straightforward. The downside is that low-input work can be sensitive to dimer formation and ligation inefficiency.

Nextera and transposase-based workflows

Nextera libraries arrive from a different logic. Tagmentation combines fragmentation and adapter addition in one step. Operationally, that changes where errors creep in.

You often gain workflow speed and convenience, but you also inherit kit-specific sequence expectations that people mis-handle when they assume every Illumina library behaves like TruSeq. That assumption is one reason trimming files get copied between projects without enough review.

CDI versus UDI is not a minor upgrade

Indexing strategy is where many pooled experiments become fragile. Combinatorial dual indexing, or CDI, reuses i5 and i7 indexes in combinations. Unique dual indexing, or UDI, assigns a fully unique pair to each sample.

That distinction matters most on high-throughput patterned flow cells. Illumina’s adapter portfolio bulletin states that UDI kits reduce index hopping to less than 0.1% on patterned flow cells like NovaSeq, and in a 384-plex AmpliSeq run, UDI can reduce hopping-induced false positives by 90% compared with CDI (Illumina adapter portfolio bulletin).

When UDI is the right choice

For small, simple pools, CDI may still be operationally acceptable. For high-plex designs, sensitive rare-variant work, or synthetic biology libraries where sample identity is tightly coupled to construct identity, UDI is usually the better engineering decision.

A practical summary:

  • Choose CDI when the pool is modest, the assay is tolerant, and index reuse won’t compromise interpretation.
  • Choose UDI when contamination by reassigned reads would create false positives or sample ambiguity.
  • Choose UDI by default when going beyond 96-plex, which the same Illumina bulletin identifies as a critical context for high-plex synthetic biology and related workflows.

If you’re arguing about whether UDI is worth it, ask a narrower question. What will one misassigned read mean in your assay?

The hidden trade-off

UDI improves specificity, but it also raises operational demands. Teams have to manage index inventory carefully, verify sample sheets, and maintain discipline across automation steps. That work is worth it, but it isn’t free. Many sequencing failures blamed on the instrument are really failures in index bookkeeping.

How to Detect and Trim Adapters from FASTQ Files

Most trimming failures have nothing to do with the software. They happen because the operator doesn’t know which adapter is present, which read contains it, or how aggressively the trimming step should prune read-through.

A scientist in a laboratory environment analyzing bioinformatics code on a computer monitor while sitting at a desk.

For many Illumina kits, the universal adapter trimming sequence is AGATCGGAAGAGCACACGTCTGAACTCCAGTCA, and for Trimmomatic a commonly used parameter string is ILLUMINACLIP:adapters.fa:2:30:10, reported as a best practice achieving over 95% trimming efficiency in the referenced adapter documentation PDF (Illumina adapter sequences v14 PDF).

Start by confirming the library structure

Before you run anything:

  • Identify the prep kit from the wet-lab record.
  • Check whether the run is single-end or paired-end.
  • Inspect raw QC for overrepresented sequence and read-end contamination.
  • Decide whether adapter read-through is expected because of short inserts.

If the prep team can’t tell you what the library is, ask for the sample sheet, index layout, and prep protocol. That usually resolves the ambiguity faster than trying six trimming settings.

For a broader prep context, the details of library construction in this NGS library prep overview are a useful operational reference.

Cutadapt examples

Cutadapt is still the cleanest tool when you want explicit control.

Single-end Cutadapt

cutadapt \
  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
  -m 20 \
  -o sample.trimmed.fastq.gz \
  sample.fastq.gz

Why these choices matter:

  • -a specifies the 3′ adapter to trim.
  • -m 20 discards reads that become too short to be useful after trimming.
  • Keeping the command minimal is often better than over-tuning on the first pass.

Paired-end Cutadapt

cutadapt \
  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
  -A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
  -m 20 \
  -o sample_R1.trimmed.fastq.gz \
  -p sample_R2.trimmed.fastq.gz \
  sample_R1.fastq.gz sample_R2.fastq.gz

This is the right baseline when both mates can run through short inserts into adapter sequence.

Trimmomatic examples

Trimmomatic is common in established pipelines because it integrates adapter clipping with additional cleanup steps.

Prepare an adapters file

Create an adapters.fa file that contains the sequence you expect to trim, for example:

>IlluminaUniversal
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Single-end Trimmomatic

trimmomatic SE -phred33 \
  sample.fastq.gz sample.trimmed.fastq.gz \
  ILLUMINACLIP:adapters.fa:2:30:10 \
  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20

Paired-end Trimmomatic

trimmomatic PE -phred33 \
  sample_R1.fastq.gz sample_R2.fastq.gz \
  sample_R1.paired.fastq.gz sample_R1.unpaired.fastq.gz \
  sample_R2.paired.fastq.gz sample_R2.unpaired.fastq.gz \
  ILLUMINACLIP:adapters.fa:2:30:10 \
  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20

ILLUMINACLIP:adapters.fa:2:30:10 deserves interpretation:

  • 2 is the seed mismatch setting.
  • 30 is the palindrome clip threshold.
  • 10 is the simple clip threshold.

Those values are widely reused because they strike a practical balance. They catch real adapter sequence without turning every low-quality tail into a false adapter match.

fastp examples

fastp is convenient when you want a compact command and built-in reporting.

Single-end fastp

fastp \
  -i sample.fastq.gz \
  -o sample.trimmed.fastq.gz \
  --detect_adapter_for_pe \
  --length_required 20

For single-end data, I still prefer explicit adapter specification when the library is known. Auto-detection is useful, but it shouldn’t replace library awareness.

Paired-end fastp

fastp \
  -i sample_R1.fastq.gz \
  -I sample_R2.fastq.gz \
  -o sample_R1.trimmed.fastq.gz \
  -O sample_R2.trimmed.fastq.gz \
  --detect_adapter_for_pe \
  --length_required 20

This walkthrough is worth watching if you want a visual refresher on trimming logic in practice.

Special case for Nextera Mate Pair

Nextera Mate Pair libraries need extra care. The documented composite trimming sequence is CTGTCTCTTATACACATCT+AGATGTGTATAAGAGACAG, which supports independent assessment of each adapter half in mate-pair cleanup. If you treat mate-pair data like ordinary short-insert paired-end data, you’ll often keep the wrong reads and discard the ones you need.

Don’t copy a “universal Illumina trimming” command into mate-pair analysis and assume you’re safe. You aren’t.

Interpreting Adapter Trimming Reports and Metrics

Trimming isn’t done when the command exits cleanly. The useful part starts when you read the report and decide what the adapter burden says about the library.

What trimming reports tell you

Most tools report a few core signals:

  • How many reads contained adapter sequence
  • How many bases were removed
  • How many reads became too short and were discarded
  • Whether contamination was symmetric across R1 and R2

Those values are more than QC decoration. They tell you whether you had a normal short-insert read-through pattern, a prep problem, or a configuration mismatch.

How to read asymmetry

If one mate trims heavily and the other barely trims, don’t assume biology. Check whether the adapter specification was incomplete, whether one read has lower quality, or whether the library structure causes one end to encounter adapter more often.

That kind of asymmetry often points to a process issue rather than a mysterious feature of the sample.

BCL Convert metrics are especially useful

Illumina’s Adapter_Metrics.csv in BCL Convert v1.10+ reports precise counts of adapter bases per lane and sample. Illumina states that these metrics support computation of true genomic yield, which is typically 70% to 95% of total bases after trimming in high-depth runs, and connects this practice to preventing alignment bias emphasized since 2016 (BCL Convert Adapter_Metrics documentation).

That matters because total sequenced bases are not the same as biologically informative bases. If adapter content is high, your apparent depth can be misleading.

A good companion for thinking about usable depth is this overview of DNA sequencing coverage, especially when you need to distinguish raw output from effective coverage.

What to do with the numbers

Use the trimming report and BCL Convert metrics together.

SignalLikely interpretationFollow-up
Low adapter trimmingInserts were generally longer than read lengthProceed, but still review read quality and duplication
Moderate adapter trimmingTypical short-insert read-throughUsually acceptable if expected for the assay
High adapter bases for one sample onlySample-specific prep issue or size-selection driftCheck fragment analysis and pooling notes
High adapter burden across many samplesRun-wide insert sizing or prep design issueReview library prep and sequencing read length choice

High adapter content is not automatically a trimming failure. Sometimes it’s an accurate warning that the library was built shorter than the sequencing strategy assumed.

When a run looks bad, adapter handling is often where the postmortem starts. The hard part is separating computational symptoms from wet-lab causes.

A scientist in a lab coat analyzing genetic sequencing data on a computer screen.

Adapter dimers are small, common, and expensive

Adapter dimers form because the forked Y-adapter design contains complementary 12-nucleotide regions. In low-concentration libraries, including single-cell and CRISPR sgRNA work, dimers can consume 50% to 90% of sequencing reads and cause 20% to 30% yield loss. The same Illumina knowledge reference also notes mitigation approaches such as optimized SPRI bead ratios like 1.8x and alternative designs such as NEBNext stem-loop structures (Illumina knowledge reference on library preparation and dimers).

That’s one of the most consequential adapter-related facts to keep in mind. Dimers don’t just waste reads. They crowd out the library you intended to sequence.

How dimers show up in data

Common signs include:

  • Very high abundance of short reads after trimming
  • Overrepresented pure adapter sequence in raw QC
  • Unexpectedly poor usable yield despite nominal sequencing output
  • Libraries that quantify weakly but still amplify strongly

That last pattern catches people all the time. Dimers amplify efficiently because they are short. A library can look PCR-productive while being analytically poor.

What works against dimers

The most effective interventions are upstream, not downstream.

  • Tighten bead cleanup conditions when the insert distribution allows it.
  • Avoid pushing low-input ligations past their comfortable range.
  • Re-check adapter:insert balance instead of only increasing PCR.
  • Consider alternative adapter chemistries if the assay class repeatedly produces dimers.

What doesn’t work well is pretending you can fix a dimer-heavy library purely with aggressive read trimming. By then, the instrument has already spent the reads.

Index hopping needs a different response

Dimers and hopping get lumped together because both create contamination. They are not the same problem.

Index hopping is a sample-assignment problem. If your assay is sensitive to rare events, the recommended answer is usually to move to unique dual indexing rather than trying to rescue interpretation later with stricter downstream thresholds.

Demultiplexing failures usually start with metadata

When demultiplexing collapses, check these first:

  1. Was the sample sheet correct for index orientation and kit type?
  2. Were i5 expectations instrument-compatible?
  3. Were the oligos used the ones assumed by the sequencing setup?
  4. Did pooling records drift from plate layout?

Most demultiplexing investigations become much shorter once someone compares the wet-lab worksheet to the sequencer configuration line by line.

If a run fails to demultiplex cleanly, don’t start by blaming BCL conversion. Start with the index map and the exact kit.

Guidelines for Designing Custom Illumina Adapters

Custom adapter design is where bioinformatics, oligo chemistry, and assay engineering collide. The biggest mistake is treating it as a sequence-formatting exercise. It’s a systems problem.

Preserve the parts Illumina chemistry expects

Any custom design has to remain compatible with the sequencing ecosystem. That means preserving the required functional logic around flow-cell binding, primer annealing, and indexing, even when you’re adding assay-specific features such as custom barcodes or UMIs.

If a custom motif disrupts the expected architecture, the failure may show up as poor clustering, bad indexing behavior, or strange read structures rather than an obvious design error.

Design for analysis, not just synthesis

A custom adapter should be easy to reason about computationally.

Good design choices include:

  • Distinct barcode space that won’t collapse under sequencing error
  • Clear positional logic for where barcode, UMI, and insert begin
  • Minimal ambiguity at junctions so trimming and parsing remain deterministic
  • Avoidance of problematic secondary structures that complicate ligation or amplification

If your parser needs a page of exception handling, the adapter design is already trying to tell you something.

Validate with a small pilot first

Before scaling a custom scheme, test whether the resulting reads are easy to demultiplex, trim, and map. That’s especially important when custom constructs will feed into genome-scale or combinatorial workflows.

A useful operational reference for how these design choices interact with broader library construction is this overview of genomic DNA libraries.

QC for custom adapters should be explicit

Don’t rely on “it amplified” as proof that the design works.

Use a validation checklist:

  • Confirm oligo identity and purity from the synthesis provider.
  • Run a pilot library before committing a full plate or production batch.
  • Inspect raw read structure manually in a subset of FASTQs.
  • Verify that trimming behavior matches the intended junctions.
  • Check whether off-target short products dominate amplification.

The best custom adapters are not just compatible with Illumina. They are easy for humans and software to interpret under failure conditions.

Frequently Asked Questions About Illumina Adapters

What happens if I use the wrong adapter sequence in trimming software

Usually one of two things happens. The software fails to trim real contamination, or it trims incompletely and leaves adapter-derived sequence behind. In harder cases, it clips reads at the wrong places and reduces usable sequence unnecessarily.

Can I mix adapters from different Illumina kits in one sequencing pool

You shouldn’t assume that’s safe. Compatibility depends on the full library architecture, indexing scheme, and sequencing setup. If kits differ in index design, orientation expectations, or read structure, pooling them can create demultiplexing and analysis problems.

How short does the insert need to be before trimming becomes essential

Trim as soon as read-through into adapter sequence is occurring. In practice, if the sequenced read length can exceed the insert, trimming is no longer optional.

Does adapter orientation matter in paired-end sequencing

Yes. Orientation matters in both trimming and demultiplexing. A correct sequence in the wrong orientation can still produce wrong results.

Should I trust auto-detection of adapters

Only if it agrees with the known library structure. Auto-detection is convenient for confirmation. It’s not a substitute for knowing what was ligated.

If trimming removes many reads, is the trimmer too aggressive

Not necessarily. Heavy read loss may be correctly reporting short inserts, adapter dimers, or poor library composition. The right response is to inspect the library assumptions, not just relax trimming settings.


Woolf Software helps R&D teams connect sequencing data, computational modeling, and DNA engineering into one practical workflow. If your group is building effective pipelines for construct validation, CRISPR design, genome-scale analysis, or synthetic biology assay development, explore Woolf Software to see how their tools can help reduce iteration cycles and improve reproducibility.