Tain logo

Systems Reliability Engineer

Tain
Full-time
On-site
Birkirkara, Malta

Responsibilities:


  • Enhance and maintain monitoring of metrics, logs and tracing (Grafana, Prometheus, ELK, OpenTelemetry & more)

  • Build automation scripts for component restarts (Jenkins, Ansible & more)

  • Proactively monitor system performance, identify potential issues, and implement preventive measures. Act as a mentor to Technical Support Engineers in these specialized areas.

  • Gain a solid understanding of the live casino platform to assist with deployments, troubleshooting issues and BAU tasks.

  • Join the 24/7 shift rota – Day, night, rest, off, repeat.

  • Communicate effectively with customers and internal stakeholders such as DevOps, studio techs, Corporate IT and Customer account management.

  • Respond and resolve incidents, minimizing downtime and ensuring system stability.

  • Collaborate with other IT departments to ensure seamless integration of new systems and services.

  • Participate in the evaluation and adoption of new SRE tools


Requirements:


  • 2+ years of experience in SRE.

  • Strong understanding of Linux/Unix operating systems

  • Familiarity with scripting languages such as Python.

  • Experience with automation tools such as Ansible and Terraform.

  • Familiarity with CI/CD concepts and tools such as Jenkins or GitLab CI/CD

  • Strong problem-solving and troubleshooting skills

  • Experience with hyperconverged systems, hypervisors such as VMware and end to end planning, execution, monitoring and troubleshooting

  • Excellent communication and teamwork skills

  • Eager to learn and adapt to new technologies and approaches

  • Passion for the iGaming industry and understanding of its unique challenges and opportunities