Advanced Usage

Large Datasets (100K+ cells)

For datasets with more than 100,000 cells, deg.js automatically enables streaming mode to process data in chunks, avoiding memory limits.

typescript

const result = await rankGenesGroups(expression, groups, {
  nGenes: 100,
  maxMemoryMB: 1024,  // Use up to 1GB GPU memory
  onProgress: (progress) => {
    console.log(`${progress.genesProcessed}/${progress.totalGenes} genes`);
    console.log(`ETA: ${(progress.estimatedRemainingMs / 1000).toFixed(0)}s`);
  },
});

Streaming Thresholds

Condition	Mode
< 100K cells	Standard (all genes at once)
≥ 100K cells	Streaming (gene chunks)

Memory Configuration

The maxMemoryMB option controls how many genes are processed per chunk:

typescript

// Conservative: Lower memory, more chunks
await rankGenesGroups(expression, groups, { maxMemoryMB: 256 });

// Aggressive: Higher memory, fewer chunks, faster
await rankGenesGroups(expression, groups, { maxMemoryMB: 2048 });

Chunked Loading (1M+ cells)

For extremely large datasets that don't fit in memory, use chunked loading with a lazy data source:

typescript

const result = await rankGenesGroups({
  nCells: 1_000_000,
  nGenes: 20_000,
  getGeneChunk: async (startGene, count) => {
    // Load data on-demand from file, database, IndexedDB, etc.
    const data = await loadFromSource(startGene, count);
    return {
      data: new Float32Array(data),
      nGenes: count,
    };
  },
}, groups, {
  onProgress: (p) => {
    updateProgressBar(p.genesProcessed / p.totalGenes);
  },
});

Example: Loading from HDF5

typescript

import * as h5wasm from 'h5wasm';

const file = await h5wasm.File.open('expression.h5');
const dataset = file.get('X');

const result = await rankGenesGroups({
  nCells: 1_000_000,
  nGenes: 20_000,
  getGeneChunk: (startGene, count) => {
    // HDF5 slice: [all cells, startGene:startGene+count]
    const data = dataset.slice([[0, nCells], [startGene, startGene + count]]);
    return { data: new Float32Array(data), nGenes: count };
  },
}, groups);

Example: Loading from IndexedDB

typescript

const db = await openDatabase('expression-db');

const result = await rankGenesGroups({
  nCells: 500_000,
  nGenes: 15_000,
  getGeneChunk: async (startGene, count) => {
    const chunks = [];
    for (let g = startGene; g < startGene + count; g++) {
      const geneData = await db.get('genes', g);
      chunks.push(geneData);
    }
    return {
      data: concatenateFloat32Arrays(chunks),
      nGenes: count,
      layout: 'column-major',  // One gene per chunk
    };
  },
}, groups);

Backend Selection

Auto (Default)

typescript

// Automatically selects WebGPU if available, falls back to WebGL2
const result = await rankGenesGroups(expression, groups);

Force WebGPU

typescript

// Use WebGPU (throws if unavailable)
const result = await rankGenesGroups(expression, groups, {
  backend: 'webgpu',
});

Force WebGL2

typescript

// Use WebGL2 (broader browser support)
const result = await rankGenesGroups(expression, groups, {
  backend: 'webgl2',
});

Sort Algorithm Selection (WebGPU)

WebGPU supports two sorting algorithms:

Algorithm	Complexity	Best For
Bitonic	O(n log²n)	< 10K cells
Radix	O(n)	≥ 10K cells

typescript

// Auto-select based on cell count (default)
await rankGenesGroups(expression, groups, {
  sortAlgorithm: 'auto',
});

// Force Radix sort (faster for large datasets)
await rankGenesGroups(expression, groups, {
  backend: 'webgpu',
  sortAlgorithm: 'radix',
});

// Force Bitonic sort
await rankGenesGroups(expression, groups, {
  backend: 'webgpu',
  sortAlgorithm: 'bitonic',
});

Reference Group Comparison

vs All Other Cells (Default)

Compare each group against all other cells combined:

typescript

// Default behavior - no reference specified
const result = await rankGenesGroups(expression, groups);
// Each group is compared against all other cells

vs Specific Group

Compare each group against a specific reference group:

typescript

const result = await rankGenesGroups(expression, groups, {
  reference: 'Control',  // Each group vs 'Control' group
});

WARNING

Specific reference comparisons with chunked input require WebGPU backend.

Analyzing Subset of Groups

typescript

// Only analyze specific groups
const result = await rankGenesGroups(expression, groups, {
  groups: ['ClusterA', 'ClusterB'],  // Ignore other clusters
});

Tie Correction

Enable tie correction for datasets with many identical expression values:

typescript

const result = await rankGenesGroups(expression, groups, {
  tieCorrect: true,
});

INFO

Tie correction adjusts the variance calculation when multiple cells have the same expression value for a gene. This is important for sparse count data.

Rank by Absolute Z-score

Rank genes by absolute Z-score magnitude (useful for finding any differentially expressed genes, regardless of direction):

typescript

const result = await rankGenesGroups(expression, groups, {
  rankbyAbs: true,  // Rank by |Z-score|
});

Custom Log Base for Fold Change

typescript

// Use log2 (default is natural log)
const result = await rankGenesGroups(expression, groups, {
  logBase: 2,
});

// Use log10
const result = await rankGenesGroups(expression, groups, {
  logBase: 10,
});

Performance Tips

First Run Initialization

The first analysis takes longer due to GPU initialization overhead (shader compilation, pipeline creation). Subsequent runs reuse the initialized context and are significantly faster.

1. Use Row-Major Layout

Row-major layout is more efficient for GPU processing:

typescript

// Preferred: row-major
const expression = {
  data: rowMajorData,
  nCells: 100000,
  nGenes: 5000,
  layout: 'row-major',  // Default, most efficient
};

2. Preallocate Gene Names

typescript

// Generate gene names once
const geneNames = Array.from({ length: nGenes }, (_, i) => `gene_${i}`);

// Reuse for multiple analyses
await rankGenesGroups(expression1, groups1, { geneNames });
await rankGenesGroups(expression2, groups2, { geneNames });

3. Limit Returned Genes

typescript

// Return only top 100 genes (faster than returning all)
const result = await rankGenesGroups(expression, groups, {
  nGenes: 100,
});

4. Use Progress Callback for UI Updates

typescript

const result = await rankGenesGroups(expression, groups, {
  onProgress: (p) => {
    // Update UI without blocking
    requestAnimationFrame(() => {
      progressBar.style.width = `${(p.genesProcessed / p.totalGenes) * 100}%`;
    });
  },
});

Error Handling

typescript

try {
  const result = await rankGenesGroups(expression, groups);
} catch (error) {
  if (error.message.includes('WebGPU')) {
    console.log('WebGPU not supported, trying WebGL2...');
    const result = await rankGenesGroups(expression, groups, {
      backend: 'webgl2',
    });
  } else {
    throw error;
  }
}

Advanced Usage ​

Large Datasets (100K+ cells) ​

Streaming Thresholds ​

Memory Configuration ​

Chunked Loading (1M+ cells) ​

Example: Loading from HDF5 ​

Example: Loading from IndexedDB ​

Backend Selection ​

Auto (Default) ​

Force WebGPU ​

Force WebGL2 ​

Sort Algorithm Selection (WebGPU) ​

Reference Group Comparison ​

vs All Other Cells (Default) ​

vs Specific Group ​

Analyzing Subset of Groups ​

Tie Correction ​

Rank by Absolute Z-score ​

Custom Log Base for Fold Change ​

Performance Tips ​

1. Use Row-Major Layout ​

2. Preallocate Gene Names ​

3. Limit Returned Genes ​

4. Use Progress Callback for UI Updates ​

Error Handling ​

See Also ​

Advanced Usage

Large Datasets (100K+ cells)

Streaming Thresholds

Memory Configuration

Chunked Loading (1M+ cells)

Example: Loading from HDF5

Example: Loading from IndexedDB

Backend Selection

Auto (Default)

Force WebGPU

Force WebGL2

Sort Algorithm Selection (WebGPU)

Reference Group Comparison

vs All Other Cells (Default)

vs Specific Group

Analyzing Subset of Groups

Tie Correction

Rank by Absolute Z-score

Custom Log Base for Fold Change

Performance Tips

1. Use Row-Major Layout

2. Preallocate Gene Names

3. Limit Returned Genes

4. Use Progress Callback for UI Updates

Error Handling

See Also