Build and maintain scalable ML infrastructure at APPIT Software in Bangalore, designing automated training pipelines, model serving platforms, and monitoring systems for production AI workloads.
Bangalore, India
Full-time
AI & Machine Learning
Responsibilities
Design and maintain CI/CD pipelines for ML model training, testing, and deployment
Build scalable model serving infrastructure using Kubernetes and containerization
Implement model monitoring, drift detection, and automated retraining pipelines
Manage GPU clusters and optimize resource utilization for training workloads
Build feature stores and data pipelines for ML experimentation and production
Establish MLOps best practices including versioning, lineage tracking, and reproducibility
Requirements
3-5 years of experience in DevOps, MLOps, or ML infrastructure engineering
Strong experience with Kubernetes, Docker, and container orchestration
Hands-on experience with MLOps tools (MLflow, Kubeflow, Airflow, or Vertex AI Pipelines)
Proficiency in Python and infrastructure-as-code (Terraform or Pulumi)
Experience with cloud ML services (AWS SageMaker, GCP Vertex AI, or Azure ML)
Knowledge of model serving frameworks (Triton, TorchServe, or BentoML)
Nice to Have
Experience with GPU orchestration and NVIDIA tooling
Knowledge of LLM serving optimization (vLLM, TGI)
Familiarity with data versioning tools (DVC, LakeFS)