Best Practices
Follow these guidelines to get the most out of Gilial.
Before Compression
Section titled “Before Compression”1. Estimate first
Section titled “1. Estimate first”Always run estimate_savings() before touching the index. The ratio is deterministic so this is free — no vectors are fetched.
savings = client.estimate_savings(strategy="balanced")print(f"Potential savings: {savings['savings_pct']}%")2. Dry-run before applying
Section titled “2. Dry-run before applying”preview = client.compress(strategy="balanced", dry_run=True)print(f"Compressed size: {preview.compressed_size_mb} MB")3. Validate recall before committing
Section titled “3. Validate recall before committing”Use validate_recall=True to run a before/after benchmark automatically. Gilial will sample query vectors from the index, record top-k results, run compression, re-query, and report overlap.
result = client.compress( strategy="balanced", dry_run=False, validate_recall=True, recall_n_queries=200, recall_k=10,)print(result.recall)# Recall@10 — 200 queries# mean : 98.40%# ...
if not result.recall.safe_to_apply: print("Recall dropped below 95% — consider reverting or using a lighter strategy")recall.safe_to_apply is True when mean recall ≥ 95%. For critical production indexes you may want to raise that bar by inspecting mean_recall directly.
4. Run a standalone benchmark first
Section titled “4. Run a standalone benchmark first”Before your first compression, run benchmark() on the unchanged index to confirm your baseline. It should return ~100% recall (querying twice with the same vectors).
baseline = client.benchmark(n_queries=200, k=10)print(baseline) # should show mean: 100.00%5. Start with Balanced, graduate to Aggressive
Section titled “5. Start with Balanced, graduate to Aggressive”- Balanced (4-bit, ~8×) is the safe default for production
- Aggressive (2-bit, ~16×) maximises savings but shifts vectors further
- high_quality (6-bit, ~5×) is near-lossless for sensitive indexes
6. Use namespace targeting for staged rollouts
Section titled “6. Use namespace targeting for staged rollouts”Compress a low-traffic namespace first, validate recall, then move to higher-traffic ones.
namespaces = client.list_namespaces()print(namespaces) # ['staging', 'docs', 'prod']
# Start with stagingresult = client.compress(namespace="staging", dry_run=False, validate_recall=True)if result.recall.safe_to_apply: client.compress(namespace="prod", dry_run=False, validate_recall=True)7. Log every compression run
Section titled “7. Log every compression run”Pass log_path="auto" to write a timestamped JSON record to ~/.gilial/logs/. The log includes compression stats and the full recall report.
result = client.compress( strategy="balanced", dry_run=False, log_path="auto", validate_recall=True,)During Compression
Section titled “During Compression”Track progress on large indexes
Section titled “Track progress on large indexes”def on_progress(done, total): pct = done / total * 100 print(f"\r {done}/{total} ({pct:.1f}%)", end="", flush=True)
client.compress(strategy="balanced", dry_run=False, on_progress=on_progress)Use custom bit depth for fine control
Section titled “Use custom bit depth for fine control”Instead of a named strategy, dial in the exact bit depth:
# Something between balanced (4-bit) and high_quality (6-bit)result = client.compress(bits_per_dim=5, dry_run=False)After Compression
Section titled “After Compression”Verify index health
Section titled “Verify index health”status = client.get_status()print(f"Vectors: {status['total_vector_count']}")Review the recall report
Section titled “Review the recall report”If validate_recall=True was used, inspect result.recall:
print(f"Mean recall: {result.recall.mean_recall * 100:.2f}%")print(f"Worst query: {result.recall.min_recall * 100:.2f}%")A mean recall above 98% is excellent. Below 95% warrants review.
Do’s and Don’ts
Section titled “Do’s and Don’ts”- Always estimate and dry-run first
- Use
validate_recall=Truebefore hitting apply on production - Target namespaces individually for staged rollouts
- Log every run with
log_path="auto" - Start with Balanced, move to Aggressive only after validating recall
❌ Don’t
Section titled “❌ Don’t”- Skip dry-run mode
- Use Aggressive strategy on production without a recall check
- Ignore
recall.min_recall— a single catastrophically bad query matters - Compress the same index repeatedly in a short window
Recommended schedule
Section titled “Recommended schedule”| Cadence | Action |
|---|---|
| Weekly | Check index growth with get_status() |
| Monthly | Run estimate_savings() dry-run |
| Quarterly | Apply compression if savings > 5%, with validate_recall=True |
Adjust based on your vector accumulation rate.