Senior Site Reliability Engineer

1 week ago


WorkFromHome, South Africa Sana Commerce Full time

Company Description What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company empowering manufacturers, distributors, and wholesalers to thrive in complex B2B commerce. Our mission is simple: help businesses build stronger relationships through seamless digital commerce. At Sana Commerce, you’ll join a team that’s bold, growth-oriented, and customer-obsessed, where every engineer has real ownership and impact. About the role We’re looking for a Senior Site Reliability Engineer (SRE) to enhance the reliability, performance, and scalability of our global e-commerce platform. You’ll play a critical role in building resilient systems, automating infrastructure, and driving observability across Azure and Kubernetes environments. This is a hands‑on engineering role where you’ll blend reliability strategy with real‑world execution, keeping today’s systems healthy while shaping the ones we’ll depend on tomorrow. Job Description What you’ll do Lead incident response and postmortems, drive investigations, document learnings, and implement permanent fixes to prevent recurrence. Manage and optimize Azure Kubernetes environments, own cluster configurations, performance, cost control, and security best practices. Build observability systems, develop dashboards, alerts, and metrics using Dynatrace, Honeycomb, ElasticSearch, Grafana/Kibana, and Azure Monitor (KQL). Automate for resilience, write reliable scripts in PowerShell, Bash, Python, or C#, embedding logging, rollback, and version control. Implement Infrastructure-as-Code, design and maintain Terraform, Bicep, or ARM templates to standardize and automate deployments. Optimize system performance, identify bottlenecks through deep monitoring, dump analysis, and right-sizing of cloud resources. Collaborate across engineering teams, integrate reliability principles into CI/CD pipelines and the broader SDLC. Participate in on‑call rotations, lead during critical incidents, ensuring lasting fixes and operational excellence. Qualifications What you’ll bring 5+ years in SRE, DevOps, or Cloud Infrastructure roles with experience in large-scale, distributed systems. Strong Azure and Kubernetes expertise (production‑level). Proven ability in observability engineering using Dynatrace, Honeycomb, Elastic, Grafana/Kibana, or Azure Monitor. Skilled in PowerShell, Bash, Python, or C#, with an automation-first mindset. Proficient in Infrastructure-as-Code (Terraform, Bicep, ARM). Solid grasp of TCP/IP, networking fundamentals, and performance tuning. Strong communicator able to translate complex technical findings into clear, actionable insights. Certifications preferred: Microsoft Certified: Azure Administrator Associate Certified Kubernetes Administrator (CKA) Why you’ll love working here Impact from day one – Join a scale‑up where your ideas shape how global businesses operate online. Continuous learning – Access a structured onboarding rated 9.1/10 by previous hires, mentorship, and feedback culture. Hybrid flexibility – Work from our Cape Town office 3 days per week and from home 2 days. Career growth – Expand your technical and leadership scope in a company built for long‑term success. Our values At Sana Commerce, our values drive everything we do: Champions of Our League – We deliver lasting success, balancing quick wins and long‑term value. Supercharge Our Customers – We help our customers lead and succeed. Determined to Grow – We embrace feedback and challenges to raise the bar. Bold Together – We take risks, collaborate deeply, and support each other. Ready to build reliability that scales? Apply now and help shape the foundation of our next‑generation SaaS platform. #J-18808-Ljbffr



  • WorkFromHome, South Africa DuckDuckGo Full time

    1 week ago Be among the first 25 applicants Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our browser on Mac, Windows, iOS, and Android, our search engine,...


  • WorkFromHome, South Africa Duckduckgo Full time

    Who We Are DuckDuckGo is an online protection company and remote‑first team dedicated to raising the standard of trust on the web. Your Team & Role As part of the Site Reliability Team, you will build and maintain world‑class infrastructure that serves millions of users. Your work will involve high‑level languages such as Perl, Go, and Python, and...


  • WorkFromHome, South Africa Sana Commerce Full time

    6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Company Description What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company empowering manufacturers, distributors, and wholesalers to thrive in complex B2B commerce. Our mission is simple: help businesses build stronger...


  • WorkFromHome, South Africa Robin AI Full time

    Robin AI City of Cape Town, Western Cape, South Africa Join or sign in to find your next job Join to apply for the Site Reliability Engineer role at Robin AI Robin AI City of Cape Town, Western Cape, South Africa Join to apply for the Site Reliability Engineer role at Robin AI About RobinRobin is on a mission to rebuild the legal industry — starting with...


  • WorkFromHome, South Africa Risingsun Softsol Full time

    Risingsun is Hiring SRE (Site Reliability Engineer) Work model: hybrid – 2/3 days at the office per week Open to Visa holders: No – only SA Citizens or SA ID holders Employment type: 12-month contract, renewable Location: JHB Client type: Banking Candidate having skilled and proactive Site Reliability Engineer (SRE) with 5+ Years experience The SRE will...


  • WorkFromHome, South Africa Canonical Full time

    Overview Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. The company is a pioneer of global distributed collaboration, with 1200+ colleagues in...


  • WorkFromHome, South Africa Canonical Full time

    Overview Site Reliability Engineer role at Canonical. Global remote location. Canonical is a leading provider of open source software and operating systems to the enterprise and technology markets, known for Ubuntu and open source infrastructure platforms. We deploy and run OpenStack, Kubernetes, storage solutions, and open source applications, applying...


  • WorkFromHome, South Africa Canonical Full time

    Canonical is a leading provider of open‑source software and operating systems for global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world's leading public cloud and silicon providers, and...


  • WorkFromHome, South Africa k0deHut Full time

    Site Reliability Engineer (SRE II) (Kubernetes/Python) Job Openings Site Reliability Engineer (SRE II) (Kubernetes/Python) About the job Site Reliability Engineer (SRE II) (Kubernetes/Python) Intermediate Site Reliability Engineer (SRE II) Our Client is offering the right candidate a great opportunity to join a fast growing South African fintech that enables...


  • WorkFromHome, South Africa GitLab Full time

    Intermediate Site Reliability Engineer, Environment Automation Join to apply for the Intermediate Site Reliability Engineer, Environment Automation role at GitLab Get AI-powered advice on this job and more exclusive features. GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000...