PKBoost AI Labs | Dec 2025 – Present
Lead systems engineer for an end-to-end RAG architecture, built for performance and scale.
Performance at Scale
- Throughput: 100,000+ queries/second.
- Latency: <5ms vector search, <300ms end-to-end response time.
- Improvement: 10–100x faster than PostgreSQL pgvector baselines.
Technical Architecture
- Stack: Rust (Axum), Tokio, USearch (in-memory HNSW), FastEmbed-rs, PostgreSQL, React.
- Security: JWT auth, Argon2 hashing, rate limiting, SQL injection protection.
- Ingestion: Multi-format pipeline (PDF, Excel, Word) with OCR and semantic chunking.
- Deployment: Single-binary deployment (<50MB) handling 1,000+ concurrent connections.
Real-World Deployment
Deployed at a Fortune 500 company (Under NDA) supporting 1,000+ employees with <5ms semantic search across 10,000+ document chunks.