Advanced Usage
Large Datasets (100K+ cells)
For datasets with more than 100,000 cells, deg.js automatically enables streaming mode to process data in chunks, avoiding memory limits.
const result = await rankGenesGroups(expression, groups, {
nGenes: 100,
maxMemoryMB: 1024, // Use up to 1GB GPU memory
onProgress: (progress) => {
console.log(`${progress.genesProcessed}/${progress.totalGenes} genes`);
console.log(`ETA: ${(progress.estimatedRemainingMs / 1000).toFixed(0)}s`);
},
});Streaming Thresholds
| Condition | Mode |
|---|---|
| < 100K cells | Standard (all genes at once) |
| ≥ 100K cells | Streaming (gene chunks) |
Memory Configuration
The maxMemoryMB option controls how many genes are processed per chunk:
// Conservative: Lower memory, more chunks
await rankGenesGroups(expression, groups, { maxMemoryMB: 256 });
// Aggressive: Higher memory, fewer chunks, faster
await rankGenesGroups(expression, groups, { maxMemoryMB: 2048 });Chunked Loading (1M+ cells)
For extremely large datasets that don't fit in memory, use chunked loading with a lazy data source:
const result = await rankGenesGroups({
nCells: 1_000_000,
nGenes: 20_000,
getGeneChunk: async (startGene, count) => {
// Load data on-demand from file, database, IndexedDB, etc.
const data = await loadFromSource(startGene, count);
return {
data: new Float32Array(data),
nGenes: count,
};
},
}, groups, {
onProgress: (p) => {
updateProgressBar(p.genesProcessed / p.totalGenes);
},
});Example: Loading from HDF5
import * as h5wasm from 'h5wasm';
const file = await h5wasm.File.open('expression.h5');
const dataset = file.get('X');
const result = await rankGenesGroups({
nCells: 1_000_000,
nGenes: 20_000,
getGeneChunk: (startGene, count) => {
// HDF5 slice: [all cells, startGene:startGene+count]
const data = dataset.slice([[0, nCells], [startGene, startGene + count]]);
return { data: new Float32Array(data), nGenes: count };
},
}, groups);Example: Loading from IndexedDB
const db = await openDatabase('expression-db');
const result = await rankGenesGroups({
nCells: 500_000,
nGenes: 15_000,
getGeneChunk: async (startGene, count) => {
const chunks = [];
for (let g = startGene; g < startGene + count; g++) {
const geneData = await db.get('genes', g);
chunks.push(geneData);
}
return {
data: concatenateFloat32Arrays(chunks),
nGenes: count,
layout: 'column-major', // One gene per chunk
};
},
}, groups);Backend Selection
Auto (Default)
// Automatically selects WebGPU if available, falls back to WebGL2
const result = await rankGenesGroups(expression, groups);Force WebGPU
// Use WebGPU (throws if unavailable)
const result = await rankGenesGroups(expression, groups, {
backend: 'webgpu',
});Force WebGL2
// Use WebGL2 (broader browser support)
const result = await rankGenesGroups(expression, groups, {
backend: 'webgl2',
});Sort Algorithm Selection (WebGPU)
WebGPU supports two sorting algorithms:
| Algorithm | Complexity | Best For |
|---|---|---|
| Bitonic | O(n log²n) | < 10K cells |
| Radix | O(n) | ≥ 10K cells |
// Auto-select based on cell count (default)
await rankGenesGroups(expression, groups, {
sortAlgorithm: 'auto',
});
// Force Radix sort (faster for large datasets)
await rankGenesGroups(expression, groups, {
backend: 'webgpu',
sortAlgorithm: 'radix',
});
// Force Bitonic sort
await rankGenesGroups(expression, groups, {
backend: 'webgpu',
sortAlgorithm: 'bitonic',
});Reference Group Comparison
vs All Other Cells (Default)
Compare each group against all other cells combined:
// Default behavior - no reference specified
const result = await rankGenesGroups(expression, groups);
// Each group is compared against all other cellsvs Specific Group
Compare each group against a specific reference group:
const result = await rankGenesGroups(expression, groups, {
reference: 'Control', // Each group vs 'Control' group
});WARNING
Specific reference comparisons with chunked input require WebGPU backend.
Analyzing Subset of Groups
// Only analyze specific groups
const result = await rankGenesGroups(expression, groups, {
groups: ['ClusterA', 'ClusterB'], // Ignore other clusters
});Tie Correction
Enable tie correction for datasets with many identical expression values:
const result = await rankGenesGroups(expression, groups, {
tieCorrect: true,
});INFO
Tie correction adjusts the variance calculation when multiple cells have the same expression value for a gene. This is important for sparse count data.
Rank by Absolute Z-score
Rank genes by absolute Z-score magnitude (useful for finding any differentially expressed genes, regardless of direction):
const result = await rankGenesGroups(expression, groups, {
rankbyAbs: true, // Rank by |Z-score|
});Custom Log Base for Fold Change
// Use log2 (default is natural log)
const result = await rankGenesGroups(expression, groups, {
logBase: 2,
});
// Use log10
const result = await rankGenesGroups(expression, groups, {
logBase: 10,
});Performance Tips
First Run Initialization
The first analysis takes longer due to GPU initialization overhead (shader compilation, pipeline creation). Subsequent runs reuse the initialized context and are significantly faster.
1. Use Row-Major Layout
Row-major layout is more efficient for GPU processing:
// Preferred: row-major
const expression = {
data: rowMajorData,
nCells: 100000,
nGenes: 5000,
layout: 'row-major', // Default, most efficient
};2. Preallocate Gene Names
// Generate gene names once
const geneNames = Array.from({ length: nGenes }, (_, i) => `gene_${i}`);
// Reuse for multiple analyses
await rankGenesGroups(expression1, groups1, { geneNames });
await rankGenesGroups(expression2, groups2, { geneNames });3. Limit Returned Genes
// Return only top 100 genes (faster than returning all)
const result = await rankGenesGroups(expression, groups, {
nGenes: 100,
});4. Use Progress Callback for UI Updates
const result = await rankGenesGroups(expression, groups, {
onProgress: (p) => {
// Update UI without blocking
requestAnimationFrame(() => {
progressBar.style.width = `${(p.genesProcessed / p.totalGenes) * 100}%`;
});
},
});Error Handling
try {
const result = await rankGenesGroups(expression, groups);
} catch (error) {
if (error.message.includes('WebGPU')) {
console.log('WebGPU not supported, trying WebGL2...');
const result = await rankGenesGroups(expression, groups, {
backend: 'webgl2',
});
} else {
throw error;
}
}See Also
- API Reference - Complete function documentation
- Architecture - How it works under the hood