Projects
OpenLM
qwen-reasoning-0.5B
- fine tuned version of qwen2.5-0.5B
- created a custom symbolic chain-of-thought dataset using a teacher model
- demonstrated that small models with no reasoning capabilities can learn to create chain of thought thinking traces.
- Implementation of arXiv:2306.14050
SCoTD-deepseek-math-7B
- Custom dataset used for symbolic chain of thought distillation in smaller models.
- Entries include a MATH dataset question and 6 unique thinking traces for each question generated by deepseek-math-7B.
- 300+ huggingface downloads.
Proximal Policy Optimization (PPO) & Critic-Actor agent architecture
- Implemented a reusable continuous and discrete PPO from scratch, with policy gradient clipping and GAE for advantage estimation
- Built and optimized reinforcement learning models using the critic-actor architecture for tasks like CartPole-v1 and HalfCheetah-v5
hyprwindow Rust GTK-based minimal workspace & application manager for Wayland desktop environments
Notate image annotation service, annotate and store images in seconds
Dante Truly free, fast and simple to use learning app based on proved spaced repetition learning algorithms