Documentation

SeqVis2 Documentation

A complete reference for every feature in SeqVis2 — from loading files to exporting publication-ready plots.

Demo dataset

Jebb D., Huang Z., Pippel M. et al. Six reference-quality genomes reveal evolution of bat adaptations. Nature 583, 578–584 (2020). https://doi.org/10.1038/s41586-020-2486-3

48 mammalian sequences · 21,468,945 bp alignment · 10,635,421 variable sites

Site set

Codon position

Spread: 185%

Active demo view: All Sites · All positions · Spread 185%.

48 sequences

Balaenoptera acutorostrata
Sorex araneus
Gorilla gorilla
Ochotona princeps
Manis javanica
Canis lupus
Ailuropoda melanoleuca
Macaca fascicularis
Pongo abelii
Rattus norvegicus
Trichechus manatus latirostris
Heterocephalus glaber
Condylura crisata
Tursiops truncatus
Tupaia belangeri
Phyllostomus discolor
Pipistrellus kuhlii
Ictidomys tridecemlineatus
Erinaceus europaeus
Microtus ochrogaster
Molossus molossus
Cricetulus griseus
Manis pentadactyla
Myotis myotis
Saimiri boliviensis
Bos taurus
Sus scrofa
Echinops telfairi
Microcebus murinus
Homo sapiens
Rhinolophus ferrumequinum
Equus caballus
Carlito syrichta
Camelus ferus
Loxodonta africana
Cavia porcellus
Mustela putorius
Pan troglodytes
Ceratotherium simum
Callithrix jacchus
Leptonychotes weddellii
Dasypus novemcinctus
Oryctolagus cuniculus
Otolemur garnettii
Orycteropus afer
Rousettus aegyptiacus
Felis catus
Mus musculus

Drag to rotate, scroll to zoom, click points to highlight. The demo dataset supports all-sites and variable-sites views plus all four codon-position frequency tables [file:44].

Overview

SeqVis2 is a browser-based tool for visualising compositional heterogeneity in nucleotide alignments. Each sequence is mapped to a point inside a regular tetrahedron whose four vertices represent the nucleotides A, T, G, and C. When all four nucleotide frequencies are equal (0.25 each) a sequence maps to the centroid; sequences biased toward one nucleotide cluster near the corresponding vertex.

Input — FASTA

Upload Data page

Click Upload Data on the home page to reach /visjson. Use the Input FASTA file picker to load a .fasta file.

For large files (up to ~1 GB), SeqVis2 uses a two-pass streaming algorithm that never loads the full file into memory. The file is read in 256 KB chunks via FileReader. A dual progress bar tracks both passes: Pass 1/2 builds all-site frequency counters and a column-level nucleotide bitmask; Pass 2/2 uses that bitmask to accumulate variable-site frequency counters. Both site views are computed from a single file read so switching between them is instant.

Each >header line starts a new sequence entry.
Unknown bases (N) and gap characters (- . space) are excluded from frequency counts.
Frequencies are computed for all positions as well as 1st, 2nd, and 3rd codon positions independently.
Both UNIX and Windows line endings are normalised automatically.
Peak memory scales with alignment length (~2.5 MB per 20 million columns), not file size.

Site Set — All Sites vs Variable Sites

Upload Data page

After loading a FASTA file, the Site Set toggle lets you switch between plotting All Sites and Variable Sites without reloading the file.

Mode	What is plotted
All Sites	Nucleotide frequencies computed across every column in the alignment, including invariant positions.
Variable Sites	Frequencies recomputed using only columns where ≥2 distinct nucleotide states are observed across all sequences — invariant columns are excluded.

Invariant sites contribute an identical compositional signal to every sequence. When they dominate an alignment their shared background dilutes the between-sequence differences you are trying to visualise — all dots collapse toward the centroid. Restricting to variable sites removes this shared background and reveals the heterogeneity that is analytically meaningful. This makes SeqVis2 a direct improvement on SeqVis2 1, which always used all sites.

// Column c is variable iff its 4-bit nucleotide mask has ≥ 2 bits set: columnMask[c] |= NT_BIT[nucleotide] // NT_BIT: A=1, T=2, G=4, C=8 isVariable(c) = mask !== 0 && (mask & (mask-1)) !== 0

The variable-site bitset is stored as a packed Uint32Array (~2.5 MB for 20 million columns). Column lookup is a single bit-extract — O(1). The left panel shows the variable-site count as N / total columns. The canvas overlay also displays the active site set whenever Variable Sites is selected.

Note: the variable-site filter is available only for FASTA input (raw sequence data). JSON-loaded datasets carry pre-computed frequency tables without column information, so the Site Set toggle is hidden for those files.

Input — JSON

Upload Data page

Use the Input JSON file picker to reload a previously exported SeqVis2 JSON. SeqVis2 exports a richer schema that contains both allSites and variableSites frequency sets per sequence. Legacy SeqVis2 1 JSON (flat format) is also accepted and is automatically mirrored into both site views.

{
  "Homo sapiens": {
    "allSites": {
      "allPositionFreq":    { "A": 0.29, "T": 0.29, "G": 0.21, "C": 0.21 },
      "firstPositionFreq":  { "A": 0.30, "T": 0.28, "G": 0.22, "C": 0.20 },
      "secondPositionFreq": { "A": 0.27, "T": 0.31, "G": 0.20, "C": 0.22 },
      "thirdPositionFreq":  { "A": 0.30, "T": 0.28, "G": 0.21, "C": 0.21 }
    },
    "variableSites": {
      "allPositionFreq":    { "A": 0.31, "T": 0.27, "G": 0.23, "C": 0.19 },
      ...
    }
  }
}

Input — CSV / TSV

Visualize CSV/TSV page

Click Visualize CSV/TSV on the home page to reach /vistab. Upload any .csv, .tsv, or .tab file with the columns below. This lets you plot published nucleotide composition tables directly without running a sequence alignment.

label,A,T,G,C
Homo sapiens,0.29,0.29,0.21,0.21
Mus musculus,0.28,0.30,0.20,0.22

First column is the row label (any name).
Columns A T G C must be decimal proportions summing to ≈ 1.
Tab-separated files (.tsv / .tab) are auto-detected by extension.

Codon Position Frequency

The four Codon Position buttons switch the data being plotted without reloading the file. Each codon position can reveal different evolutionary pressures. These selectors work independently of the Site Set toggle — you can view, for example, variable-site 3rd-position frequencies in a single click.

Mode	What is plotted
All Positions	Mean frequency across the whole sequence (or variable-site subset)
1st Position	Frequencies at every 3rd base starting at offset 0
2nd Position	Frequencies at every 3rd base starting at offset 1
3rd Position	Frequencies at every 3rd base starting at offset 2 — most variable, reflects synonymous substitutions

Codon position is determined by the original column index in the full alignment, so reading frame is preserved even when variable-site filtering removes columns.

Vertex Assignment

Expand the Vertex assignment panel to drag nucleotides between the four tetrahedron corners. This lets you group purines (A+G) or pyrimidines (C+T) onto a single vertex to visualise strand-bias or other compositional patterns.

Each vertex can hold one or more nucleotide letters (e.g., AG vs CT).
When a nucleotide is moved, its frequency is added to the destination vertex weight.
The 3D plot and vertex labels update instantly — no re-upload required.
Vertex reassignment respects the current Site Set selection.
Click Reset Tetrahedron to restore the default A/T/G/C layout.

Auto-scale Axes

When Auto-scale axes to data maximum is checked, the axis range is set to the highest individual nucleotide frequency present in the active dataset (rounded up to the nearest 0.05). This expands the visible region of the tetrahedron so that points fill the space rather than clustering near the centroid. The scale is recomputed automatically when the Site Set or Codon Position selection changes.

axisScale = ⌈ max(A, T, G, C across all sequences) / 0.05 ⌉ × 0.05

The current axis max is shown as a badge next to the checkbox when the scale is less than 1. Disable auto-scale to compare multiple datasets on a fixed 0–1 axis.

Spread (Visual Separation)

The Spread slider (0–500%) applies a visual-only radial expansion to separate overlapping dots. It does not alter the underlying frequency data or the exported JSON. Spread is particularly useful when variable-site filtering has already reduced the compositional range and points still cluster tightly.

centroid = mean of all plotted points

dᵢ = Pᵢ − centroid

scale = (inradius × 0.90 × strength) / max(|dᵢ|)

P′ᵢ = centroid + dᵢ × scale (only if scale > 1)

The tetrahedron inradius bounds the spread so no point can exit the shape.
Relative distances between points are preserved — nearby sequences stay nearby.
Spread re-applies automatically when the Site Set, vertex assignment, or codon position changes.
A badge on the 3D view indicates when spread is active.

3D Interaction

The right panel is a live Three.js canvas. All standard OrbitControls gestures apply:

Gesture	Action
Left-drag	Rotate tetrahedron
Right-drag / two-finger drag	Pan the scene
Scroll / pinch	Zoom in / out
Click a dot	Toggle highlight (pink) and show species name + frequencies in overlay
Click a table row	Highlight the corresponding dot in the 3D view
Cycle Camera Angle	Snap to 6 preset viewpoints around the tetrahedron
Start / Stop Rotating	Toggle continuous auto-rotation

The selected-species overlay shows frequencies for the currently active Site Set and Codon Position. The canvas badge in the top-right corner indicates the active site view whenever Variable Sites is selected.

Export

Three export options are available in the Export section of the left panel. The JSON export respects the current Site Set selection and uses the SeqVis2 schema.

Format	Contents	Notes
↓ JSON	All frequency data (allSites + variableSites) for all sequences	SeqVis2 schema. Spread and axis scale are not applied — scientifically accurate. File is named seqvis-all.json or seqvis-variable.json based on current view.
↓ PNG	Raster screenshot of the current 3D view	Captures the exact current rotation and camera angle. Requires preserveDrawingBuffer on the Canvas.
↓ SVG	SVG wrapper embedding the PNG data-URL	Suitable for vector workflows; the 3D content is a raster image inside the SVG envelope.

Citation

If you use SeqVis2 in published work, please cite:

Placeholder