PKBoost AI Labs | Dec 2025 – Present

Lead systems engineer for an end-to-end RAG architecture, built for performance and scale.

Performance at Scale

  • Throughput: 100,000+ queries/second.
  • Latency: <5ms vector search, <300ms end-to-end response time.
  • Improvement: 10–100x faster than PostgreSQL pgvector baselines.

Technical Architecture

  • Stack: Rust (Axum), Tokio, USearch (in-memory HNSW), FastEmbed-rs, PostgreSQL, React.
  • Security: JWT auth, Argon2 hashing, rate limiting, SQL injection protection.
  • Ingestion: Multi-format pipeline (PDF, Excel, Word) with OCR and semantic chunking.
  • Deployment: Single-binary deployment (<50MB) handling 1,000+ concurrent connections.

Real-World Deployment

Deployed at a Fortune 500 company (Under NDA) supporting 1,000+ employees with <5ms semantic search across 10,000+ document chunks.