Key Responsibilities
Monitoring and Alerting
Deploy and manage monitoring tools like Prometheus and Grafana. Set up alerts to detect issues early.
Production Support
Analyze logs and metrics to troubleshoot real-time system issues and support production systems.
Incident Management
Participate in on-call rotations, respond to incidents, and contribute to post-incident reviews.
Kubernetes and Containers
Work with containerized applications and Kubernetes clusters, ensuring proper deployment and scaling.
Automation
Develop scripts using Python or shell to automate repetitive tasks and improve efficiency.
Collaboration
Work closely with development teams to improve system reliability and architecture.
Performance Optimization
Identify bottlenecks and support capacity planning for scalable systems.
Eligibility Criteria
Education
Bachelor’s degree in Computer Science, Information Technology, or a related field.
Batch
2024 or 2025 graduates.
Technical Skills
Strong understanding of Linux, networking fundamentals (TCP/IP, DNS), and basic scripting. Knowledge of cloud platforms and containerization is an advantage.
Soft Skills
Good communication, analytical thinking, and a willingness to learn and adapt.

