Skip to content

Advanced Usage

Large Datasets (100K+ cells)

For datasets with more than 100,000 cells, deg.js automatically enables streaming mode to process data in chunks, avoiding memory limits.

typescript
const result = await rankGenesGroups(expression, groups, {
  nGenes: 100,
  maxMemoryMB: 1024,  // Use up to 1GB GPU memory
  onProgress: (progress) => {
    console.log(`${progress.genesProcessed}/${progress.totalGenes} genes`);
    console.log(`ETA: ${(progress.estimatedRemainingMs / 1000).toFixed(0)}s`);
  },
});

Streaming Thresholds

ConditionMode
< 100K cellsStandard (all genes at once)
≥ 100K cellsStreaming (gene chunks)

Memory Configuration

The maxMemoryMB option controls how many genes are processed per chunk:

typescript
// Conservative: Lower memory, more chunks
await rankGenesGroups(expression, groups, { maxMemoryMB: 256 });

// Aggressive: Higher memory, fewer chunks, faster
await rankGenesGroups(expression, groups, { maxMemoryMB: 2048 });

Chunked Loading (1M+ cells)

For extremely large datasets that don't fit in memory, use chunked loading with a lazy data source:

typescript
const result = await rankGenesGroups({
  nCells: 1_000_000,
  nGenes: 20_000,
  getGeneChunk: async (startGene, count) => {
    // Load data on-demand from file, database, IndexedDB, etc.
    const data = await loadFromSource(startGene, count);
    return {
      data: new Float32Array(data),
      nGenes: count,
    };
  },
}, groups, {
  onProgress: (p) => {
    updateProgressBar(p.genesProcessed / p.totalGenes);
  },
});

Example: Loading from HDF5

typescript
import * as h5wasm from 'h5wasm';

const file = await h5wasm.File.open('expression.h5');
const dataset = file.get('X');

const result = await rankGenesGroups({
  nCells: 1_000_000,
  nGenes: 20_000,
  getGeneChunk: (startGene, count) => {
    // HDF5 slice: [all cells, startGene:startGene+count]
    const data = dataset.slice([[0, nCells], [startGene, startGene + count]]);
    return { data: new Float32Array(data), nGenes: count };
  },
}, groups);

Example: Loading from IndexedDB

typescript
const db = await openDatabase('expression-db');

const result = await rankGenesGroups({
  nCells: 500_000,
  nGenes: 15_000,
  getGeneChunk: async (startGene, count) => {
    const chunks = [];
    for (let g = startGene; g < startGene + count; g++) {
      const geneData = await db.get('genes', g);
      chunks.push(geneData);
    }
    return {
      data: concatenateFloat32Arrays(chunks),
      nGenes: count,
      layout: 'column-major',  // One gene per chunk
    };
  },
}, groups);

Backend Selection

Auto (Default)

typescript
// Automatically selects WebGPU if available, falls back to WebGL2
const result = await rankGenesGroups(expression, groups);

Force WebGPU

typescript
// Use WebGPU (throws if unavailable)
const result = await rankGenesGroups(expression, groups, {
  backend: 'webgpu',
});

Force WebGL2

typescript
// Use WebGL2 (broader browser support)
const result = await rankGenesGroups(expression, groups, {
  backend: 'webgl2',
});

Sort Algorithm Selection (WebGPU)

WebGPU supports two sorting algorithms:

AlgorithmComplexityBest For
BitonicO(n log²n)< 10K cells
RadixO(n)≥ 10K cells
typescript
// Auto-select based on cell count (default)
await rankGenesGroups(expression, groups, {
  sortAlgorithm: 'auto',
});

// Force Radix sort (faster for large datasets)
await rankGenesGroups(expression, groups, {
  backend: 'webgpu',
  sortAlgorithm: 'radix',
});

// Force Bitonic sort
await rankGenesGroups(expression, groups, {
  backend: 'webgpu',
  sortAlgorithm: 'bitonic',
});

Reference Group Comparison

vs All Other Cells (Default)

Compare each group against all other cells combined:

typescript
// Default behavior - no reference specified
const result = await rankGenesGroups(expression, groups);
// Each group is compared against all other cells

vs Specific Group

Compare each group against a specific reference group:

typescript
const result = await rankGenesGroups(expression, groups, {
  reference: 'Control',  // Each group vs 'Control' group
});

WARNING

Specific reference comparisons with chunked input require WebGPU backend.

Analyzing Subset of Groups

typescript
// Only analyze specific groups
const result = await rankGenesGroups(expression, groups, {
  groups: ['ClusterA', 'ClusterB'],  // Ignore other clusters
});

Tie Correction

Enable tie correction for datasets with many identical expression values:

typescript
const result = await rankGenesGroups(expression, groups, {
  tieCorrect: true,
});

INFO

Tie correction adjusts the variance calculation when multiple cells have the same expression value for a gene. This is important for sparse count data.

Rank by Absolute Z-score

Rank genes by absolute Z-score magnitude (useful for finding any differentially expressed genes, regardless of direction):

typescript
const result = await rankGenesGroups(expression, groups, {
  rankbyAbs: true,  // Rank by |Z-score|
});

Custom Log Base for Fold Change

typescript
// Use log2 (default is natural log)
const result = await rankGenesGroups(expression, groups, {
  logBase: 2,
});

// Use log10
const result = await rankGenesGroups(expression, groups, {
  logBase: 10,
});

Performance Tips

First Run Initialization

The first analysis takes longer due to GPU initialization overhead (shader compilation, pipeline creation). Subsequent runs reuse the initialized context and are significantly faster.

1. Use Row-Major Layout

Row-major layout is more efficient for GPU processing:

typescript
// Preferred: row-major
const expression = {
  data: rowMajorData,
  nCells: 100000,
  nGenes: 5000,
  layout: 'row-major',  // Default, most efficient
};

2. Preallocate Gene Names

typescript
// Generate gene names once
const geneNames = Array.from({ length: nGenes }, (_, i) => `gene_${i}`);

// Reuse for multiple analyses
await rankGenesGroups(expression1, groups1, { geneNames });
await rankGenesGroups(expression2, groups2, { geneNames });

3. Limit Returned Genes

typescript
// Return only top 100 genes (faster than returning all)
const result = await rankGenesGroups(expression, groups, {
  nGenes: 100,
});

4. Use Progress Callback for UI Updates

typescript
const result = await rankGenesGroups(expression, groups, {
  onProgress: (p) => {
    // Update UI without blocking
    requestAnimationFrame(() => {
      progressBar.style.width = `${(p.genesProcessed / p.totalGenes) * 100}%`;
    });
  },
});

Error Handling

typescript
try {
  const result = await rankGenesGroups(expression, groups);
} catch (error) {
  if (error.message.includes('WebGPU')) {
    console.log('WebGPU not supported, trying WebGL2...');
    const result = await rankGenesGroups(expression, groups, {
      backend: 'webgl2',
    });
  } else {
    throw error;
  }
}

See Also

Released under the MIT License.