AI & Machine LearningFull-timeHybrid

Reinforcement Learning Engineer

Design reinforcement learning systems at APPIT Software in Montreal, building adaptive AI agents for optimization, autonomous decision-making, and RLHF alignment of large language models.

Montreal, Canada

Full-time

AI & Machine Learning

Responsibilities

Design and implement reinforcement learning algorithms for enterprise optimization problems
Build RLHF and reward modeling pipelines for LLM alignment and fine-tuning
Develop simulation environments for training and evaluating RL agents
Implement multi-agent reinforcement learning systems for complex coordination tasks
Optimize RL training stability and sample efficiency using state-of-the-art techniques
Collaborate with research teams to translate RL advances into production applications

Requirements

5+ years of ML experience with 2+ years focused on reinforcement learning
Deep knowledge of RL algorithms (PPO, SAC, DQN, MCTS, and their variants)
Experience with RL frameworks (Stable-Baselines3, RLlib, CleanRL)
Strong mathematical background in dynamic programming, control theory, and optimization
Experience with RLHF for language model alignment
Proficiency in PyTorch and experience with parallel environment simulation

Nice to Have

Publications in RL research (NeurIPS, ICML, ICLR)
Experience with robotics or autonomous systems
Knowledge of offline RL and decision transformers

Skills

PythonPyTorchReinforcement LearningRLHFPPOSimulationMulti-Agent RLOptimization

Apply for this position

Fill in your details below to submit your application.

Related Positions

AI & Machine LearningHybrid

Reinforcement Learning Engineer

Design reinforcement learning systems at APPIT Software in Montreal, building adaptive AI agents for optimization, autonomous decision-making, and RLHF alignment of large language models.

Montreal, Canada

Full-time

AI & Machine Learning

Responsibilities

Design and implement reinforcement learning algorithms for enterprise optimization problems
Build RLHF and reward modeling pipelines for LLM alignment and fine-tuning
Develop simulation environments for training and evaluating RL agents
Implement multi-agent reinforcement learning systems for complex coordination tasks
Optimize RL training stability and sample efficiency using state-of-the-art techniques
Collaborate with research teams to translate RL advances into production applications

Requirements

5+ years of ML experience with 2+ years focused on reinforcement learning
Deep knowledge of RL algorithms (PPO, SAC, DQN, MCTS, and their variants)
Experience with RL frameworks (Stable-Baselines3, RLlib, CleanRL)
Strong mathematical background in dynamic programming, control theory, and optimization
Experience with RLHF for language model alignment
Proficiency in PyTorch and experience with parallel environment simulation

Nice to Have

Publications in RL research (NeurIPS, ICML, ICLR)
Experience with robotics or autonomous systems
Knowledge of offline RL and decision transformers

Skills

PythonPyTorchReinforcement LearningRLHFPPOSimulationMulti-Agent RLOptimization

Reinforcement Learning Engineer

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

LLM Fine-Tuning & Optimization Engineer

AI/ML Engineer

iOS Developer (Swift/SwiftUI)

Ruby on Rails Developer

Reinforcement Learning Engineer

Responsibilities

Requirements

Nice to Have

Skills

Apply for this position

Related Positions

LLM Fine-Tuning & Optimization Engineer

AI/ML Engineer

iOS Developer (Swift/SwiftUI)

Ruby on Rails Developer