Personal builds along the same lines as my Wells Fargo work — agentic LLM pipelines, retrieval, and applied ML — plus tooling I needed for myself. Plenty more side experiments on GitHub.
Flagship · v0.4 · Open Source
kharcha-ai · drop your bank PDFs in, get a forensic spend analysis out
A multi-agent LangGraph system that ingests Indian bank statements and credit-card PDFs, runs every transaction through local Gemma 4, and renders an editorial 10-page Spend Analysis PDF — entirely offline. Deterministic parsers feed an LLM-first tagger; a friend auto-detector finds money loops from bidirectional UPI flow; a validator agent re-checks low-confidence rows; a narrative agent writes per-source analytical prose, gated by a deterministic quality check that catches degenerate LLM output before it reaches the PDF. Layered detectors handle EMI installments, recurring subscriptions, 2σ category anomalies, and cross-source reconciliation against the savings account. Five-layer privacy gitignore — zero bank data has ever been committed across any branch. Idempotent re-ingestion via deterministic hashes; 37 tests green.
Multi-agentLangGraphGemma 4OllamaFastAPINext.js 15ReportLab
View on GitHub →
Open Source
VoiceToText-with-UI
An open-source PyQt6 AI dictation assistant. Real-time, on-device transcription via faster-whisper on NVIDIA GPUs. Hardware-level Bluetooth HFP toggling to prevent audio degradation, system-tray background execution, graceful VRAM cleanup, global hotkey, and clipboard auto-paste.
PyQt6faster-whisperCUDAWin32Whisper
View on GitHub →
Recent · Open Source
lockin · Focus Timer
A green-themed standalone Windows focus timer I built for my own interview prep. Wall-clock based (survives laptop sleep), Picture-in-Picture mini mode for always-on-top, procedurally-generated brown / pink / white / rain ambient noise via Web Audio, and File System Access API auto-export so every session streams to disk as jsonl for pandas analytics. SVG-rendered calendar heatmap, hour-of-day grid, and personal-bests dashboard. No build, no deps, ~1500 lines.
Vanilla JSWeb AudioDocument PiPFile System AccessSVGIndexedDB
GitHub →
Open Source
Lumen.AI · NL → SQL
A natural-language-to-SQL pipeline using a local Ollama sqlcoder model with a llama3 cleanup pass, MySQL execution, and a Streamlit UI. Lets non-engineers query a database in plain English with no data leaving the machine.
OllamasqlcoderLangChainStreamlitMySQL
GitHub →
Hackathon
GAIDP — Gen AI Data Profiler
Anomaly detection over Federal Reserve FR Y-14Q-style corporate-loan data using Isolation Forest on numerical fields and DBSCAN on categorical, with a token-budget-aware router that escalates flagged rows to an LLM for narrative explanation.
scikit-learntiktokenpandasJupyter
GitHub →
Led to Publication
Grey Wolf Optimizer · Feature Selection
Implementations of GWO and three variants — Binary GWO, Multi-Objective GWO, and Binary MOGWO — wrapping a Keras ANN evaluator to perform metaheuristic feature selection on UCI ML datasets. Precursor work for the SaBMOGWO-S paper above.
NumPyscikit-learnTensorFlowMetaheuristics
GitHub →
Plus more
See all on GitHub →
Older side experiments across blockchain, Linux kernel modules, Verilog FPGA, GANs, computer vision, and full-stack web — kept around as artifacts of past learning, separate from current focus.
github.com/Arjun-B-J →