Go back

Senior DevOps / Site Reliability Engineer (SRE)

The Role

Own and improve Terraform-managed AWS infrastructure for Django + Go services, focusing on AWS, CI/CD, and infrastructure reviews rather than application development. Work closely with platform engineering to review and validate infrastructure changes before deployment.

In this role, you will be responsible for designing, building, and maintaining scalable, secure, and cost-efficient cloud infrastructure across multiple shared environments. You will play a key role in ensuring system reliability, performance, and operational excellence across production systems.

Schedule

  • 9 AM to 5PM Sydney with 30 mins unpaid lunch (37.5 hrs per week)

Responsibilities

  • Design, implement, and maintain infrastructure-as-code using Terraform across multi-environment AWS setups (development, staging, production, and shared services).
  • Architect and manage core AWS services including VPC networking, subnets, routing, security groups, IAM, RDS, Redis, ECR, ECS, and/or EKS.
  • Build and maintain robust CI/CD pipelines, ideally using GitHub Actions, to enable reliable and automated deployment workflows.
  • Ensure infrastructure security best practices, including IAM role design, secrets management, encryption standards, and access controls.
  • Manage and optimize system observability using monitoring and logging tools such as CloudWatch, Prometheus, Grafana, Datadog, or equivalent platforms.
  • Support and improve production systems reliability, including proactive monitoring, alerting, and incident response.
  • Lead or participate in production incident management, including troubleshooting, root cause analysis, mitigation, and post-incident improvements.
  • Work closely with engineering teams to support application deployments and ensure infrastructure aligns with application needs.
  • Manage and optimize database systems (MySQL), including backups, migrations, replication strategies, and failover readiness.
  • Contribute to performance tuning and cost optimization (FinOps) across AWS infrastructure and services.
  • Maintain Linux-based systems and perform advanced debugging of infrastructure and runtime issues.
  • Support shared, multi-tenant infrastructure environments where changes may impact multiple services, ensuring safe and well-coordinated deployments.
  • Collaborate with developers to understand application behavior (including services built in Django and Go) to better support debugging and deployment reliability.
  • Continuously improve infrastructure standards, documentation, automation, and operational processes.

Must have skills

  • Strong production Terraform and AWS experience (especially networking, RDS, Redis, ECR, ECS/EKS)
  • CI/CD ownership (ideally GitHub Actions)
  • Linux and systems debugging skills
  • MySQL operational knowledge (migrations, backups, failover)
  • IAM and secrets management expertise
  • Experience handling production incidents calmly and methodically

Nice to have

  • Working familiarity with Django and Go — enough to read service code, debug a deploy, and reason about runtime behavior
  • Observability tooling (CloudWatch + Prometheus/Grafana/Datadog or similar)
  • FinOps / AWS cost-optimization experience
  • Experience operating shared multi-tenant environments where one team's change can affect another's

Independent Contractor Perks

  • Permanent Work from Home
  • Immediate Hiring
  • Health Insurance Coverage for eligible locations

Note

  • Please click the "Apply" button to complete your application, including the assessment questions, technical check, and voice recording. Your hourly pay rate will be established based on your performance in the application process; submissions with all requirements fulfilled will receive priority review.

Senior DevOps / Site Reliability Engineer (SRE)

Job Category

Engineering

Job Type

Full Time (35 hours or more per week)

Work Schedule and Timezone

Sydney

Published on

May 27 2026