Design reinforcement learning systems at APPIT Software in Montreal, building adaptive AI agents for optimization, autonomous decision-making, and RLHF alignment of large language models.
Montreal, Canada
Full-time
AI & Machine Learning
Responsibilities
Design and implement reinforcement learning algorithms for enterprise optimization problems
Build RLHF and reward modeling pipelines for LLM alignment and fine-tuning
Develop simulation environments for training and evaluating RL agents
Implement multi-agent reinforcement learning systems for complex coordination tasks
Optimize RL training stability and sample efficiency using state-of-the-art techniques
Collaborate with research teams to translate RL advances into production applications
Requirements
5+ years of ML experience with 2+ years focused on reinforcement learning
Deep knowledge of RL algorithms (PPO, SAC, DQN, MCTS, and their variants)
Experience with RL frameworks (Stable-Baselines3, RLlib, CleanRL)
Strong mathematical background in dynamic programming, control theory, and optimization
Experience with RLHF for language model alignment
Proficiency in PyTorch and experience with parallel environment simulation