Site Reliability Engineer

2 weeks ago

Cape Town, Western Cape, South Africa SprintHive - Intelligent Customer Onboarding Full time R250 000 - R600 000 per year

Junior Site Reliability Engineer - SprintHiveAbout SprintHive

SprintHive is a South African fintech enabling seamless end-to-end customer onboarding that drives conversion rates, prevents fraud, and reduces risk. Our automated solutions fully onboard new customers in under two minutes.

The Role

We're seeking a Junior Site Reliability Engineer eager to launch their career in infrastructure and platform engineering. You'll work directly with our CTO, receiving hands-on mentorship while contributing to production systems from day one. This fully remote position offers exceptional learning opportunities in a modern cloud-native environment.

What You'll DoLearn & Build

Gain hands-on experience with infrastructure-as-code using Terraform
Work with Kubernetes and containerised applications
Learn cloud platforms (Google Cloud and AWS) through real projects
Develop automation scripts in Golang, Bash, or Python
Participate in code reviews to improve your skills

Operational Support

Assist in maintaining and improving our monitoring stack
Help debug production issues alongside senior team members
Document procedures and infrastructure components
Support change control processes for production deployments
Participate in on-call rotation (with full support and gradual responsibility increase)

Growth Opportunities

Own small to medium infrastructure projects with guidance
Contribute to security initiatives and incident response
Collaborate with development team on infrastructure needs
Progress toward independent project ownership

Requirements

Strong problem-solving mindset and eagerness to learn
Basic programming ability in any language
Fundamental understanding of Linux/Unix systems
Interest in cloud infrastructure and DevOps practices
Ability to work independently in a remote environment

Preferred Qualifications

Computer Science degree or relevant coursework (bootcamps, self-study welcome)
Personal projects demonstrating infrastructure or automation work
Familiarity with Git, containers, and cloud services
Contributions to open source projects

Our Tech Stack

Infrastructure
: Kubernetes (GKE), Terraform, Kong API Gateway
Monitoring
: Prometheus, Grafana, Elastic, Kibana, Mezmo, Falco
Languages
: Kotlin, Python, JavaScript, Golang
Architecture
: Microservices with Event Sourcing and CQRS
Database
: MongoDB Atlas
CI/CD
: Jenkins

Compensation & Benefits

Salary: Competitive market rate
21 days paid leave
Direct mentorship from experienced CTO
Flexible working hours
Fully remote position
High-quality hardware (MacBook Pro, 34" Dell monitor)
AI assistant subscriptions
Clear growth path to Senior SRE role

Intermediate Site Reliability Engineer - SprintHiveAbout SprintHiveThe Role

We're seeking an Intermediate Site Reliability Engineer ready to take ownership of significant infrastructure projects. Working directly with our CTO, you'll have the autonomy to drive improvements while continuing to grow your expertise. This fully remote position offers the perfect balance of independence and support.

What You'll DoInfrastructure Ownership

Maintain and improve infrastructure using Terraform across GCP and AWS
Deploy and optimise applications on Kubernetes (GKE)
Ensure infrastructure automation and reproducibility
Own medium to large infrastructure projects independently
Debug and resolve complex production issues

Operational Excellence

Build automation tooling to eliminate manual processes
Improve monitoring and alerting systems
Write and execute production change plans
Maintain security best practices in all deployments
Participate confidently in on-call rotation

Collaboration & Growth

Contribute to infrastructure architecture discussions
Document systems and share knowledge with team
Partner with development team on infrastructure requirements
Participate in security incident response
Begin mentoring junior team members as we grow

RequirementsMust-Haves

2-4 years of infrastructure/DevOps/SRE experience
Hands-on experience with cloud platforms (GCP, AWS, or Azure)
Working knowledge of infrastructure-as-code tools
Container and orchestration experience (Docker, Kubernetes)
Proven ability to complete projects independently
Solid programming skills in at least one language (Python, Go, Bash)

Preferred Qualifications

Terraform experience
GCP/AWS certification
Experience with monitoring tools (Prometheus, Grafana, ELK)
Exposure to microservices architectures
Computer Science degree or equivalent

Our Tech Stack

Infrastructure
: Kubernetes (GKE), Terraform, Kong API Gateway
Monitoring
: Prometheus, Grafana, Elastic, Kibana, Mezmo, Falco
Languages
: Kotlin, Python, JavaScript, Golang
Architecture
: Microservices with Event Sourcing and CQRS
Database
: MongoDB Atlas
CI/CD
: Jenkins

Compensation & Benefits

Salary: Competitive market rate
21 days paid leave
Flexible working hours
Fully remote position
High-quality hardware (MacBook Pro, 34" Dell monitor)
AI assistant subscriptions
High project autonomy
Clear growth path to Senior role

Why This Role?

Perfect for engineers ready to move beyond junior tasks but not yet requiring senior-level compensation. You'll own real projects, make meaningful decisions, and grow rapidly in a small team environment.

Senior Site Reliability Engineer - SprintHiveAbout SprintHiveThe Role

We're seeking a Senior Site Reliability Engineer to co-architect our infrastructure future. As the second senior SRE working directly with our CTO, you'll have exceptional influence over platform strategy and technical direction. This fully remote position is ideal for an expert seeking impact without bureaucracy.

What You'll DoStrategic Leadership

Co-design long-term infrastructure vision and roadmap
Lead complex, multi-quarter infrastructure transformations
Define SRE standards, practices, and tooling strategies
Make architectural decisions balancing scale, cost, and complexity
Evaluate and introduce new technologies

Technical Excellence

Architect sophisticated infrastructure solutions using Terraform
Design zero-downtime deployment strategies and disaster recovery plans
Lead Kubernetes platform optimisation for scale and efficiency
Drive security architecture including identity management (OAuth2)
Build advanced automation and self-healing systems

Team & Culture Building

Mentor team members and establish engineering excellence standards
Lead incident response and drive blameless post-mortem culture
Interface with executive team on infrastructure strategy
Own vendor relationships and technology evaluations
Help build and scale the SRE team as we grow

RequirementsMust-Haves

5+ years of SRE/DevOps experience in production environments
Expert-level knowledge of at least one major cloud provider
Advanced Terraform skills with large-scale infrastructure experience
Deep Kubernetes expertise including performance tuning and troubleshooting
Proven track record of leading infrastructure initiatives
Strong programming skills (Go, Python) for tooling development
Experience mentoring engineers and driving technical standards

Preferred Qualifications

Fintech or high-compliance environment experience
Multi-cloud architecture experience
Security certifications or demonstrated security expertise
Open source contributions to infrastructure tools
Experience scaling startups through rapid growth

Our Tech Stack

Infrastructure
: Kubernetes (GKE), Terraform, Kong API Gateway
Monitoring
: Prometheus, Grafana, Elastic, Kibana, Mezmo, Falco
Languages
: Kotlin, Python, JavaScript, Golang
Architecture
: Microservices with Event Sourcing and CQRS
Database
: MongoDB Atlas
CI/CD
: Jenkins

Compensation & Benefits

Salary: Competitive market rate
21 days paid leave
Flexible working hours
Fully remote position
Premium hardware setup (MacBook Pro, 34" Dell monitor)
AI assistant subscriptions
Strategic influence at executive level
Budget authority for infrastructure decisions
Opportunity to build and lead growing SRE team

Why This Role?

Rare opportunity to join as a founding SRE member with CTO-level partnership. You'll have the authority of a Head of Infrastructure without the bureaucracy, the technical challenges of a scale-up without legacy constraints, and the influence to build the team and culture from scratch.

Senior Site Reliability Engineer

2 weeks ago

Cape Town, Western Cape, South Africa Sana Commerce Full time R120 000 - R180 000 per year

Company Description What started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company empowering manufacturers, distributors, and wholesalers to thrive in complex B2B commerce.Our mission is simple: help businesses build stronger relationships through seamless digital commerce.At Sana Commerce, you'll join a team that's bold,...
Senior Site Reliability Engineer

7 days ago

Cape Town, Western Cape, South Africa Sana Commerce Full time R1 500 000 - R2 500 000 per year

Company DescriptionWhat started in 2007 with a pizza and a plan has grown into a fast-moving SaaS company empowering manufacturers, distributors, and wholesalers to thrive in complex B2B commerce.Our mission is simple: help businesses build stronger relationships through seamless digital commerce.At Sana Commerce, you'll join a team that's bold,...
Site Reliability

4 days ago

Cape Town, Western Cape, South Africa Canonical - Jobs Full time R80 000 - R120 000 per year

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...
Site Reliability Engineer

4 days ago

Cape Town, Western Cape, South Africa Canonical - Jobs Full time R600 000 - R1 200 000 per year

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers, and...
Reliability Engineer

2 weeks ago

Cape Town, Western Cape, South Africa 1a018d35-7946-4b80-8f4b-8f4ab3a9b78a Full time R120 000 - R180 000 per year

RELIABILITY ENGINEERWe are looking for a BSc or BEng (Mechanical or Electrical) Engineer, GCC (Factories) with Pr.Eng (ECSA) with 5+ years experiencein any physical asset intensive industry (such as oil & gas, chemical, utilities, power generation, manufacturing, facilities, nuclear etc.). The role will supportour Client's business a Reliability Engineer on...
Site Reliability Engineer

4 days ago

Cape Town, Western Cape, South Africa LexisNexis Full time R1 000 000 - R2 500 000 per year

About Our TeamLexisNexis Legal & Professional, which serves customers in more than 150 countries with 11,800 employees worldwide, is part of RELX, a global provider of information based analytics and decision tools for professional and business customers. Our company has been a long-time leader in deploying AI and advanced technologies to the legal market to...
Senior Site Reliability Engineer

4 days ago

Cape Town, Western Cape, South Africa Canonical - Jobs Full time US$120 000 - US$240 000 per year

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world's leading public cloud and silicon providers,...
Site Reliability Engineering Manager

4 days ago

Cape Town, Western Cape, South Africa Canonical - Jobs Full time R600 000 - R1 200 000 per year

Canonical is a leading provider of open-source software and operating systems for global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world's leading public cloud and silicon providers, and...
Senior Site Reliability

4 days ago

Cape Town, Western Cape, South Africa Canonical - Jobs Full time R120 000 - R180 000 per year

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...
Site Engineer

2 days ago

Cape Town, Western Cape, South Africa Hire Resolve Full time R250 000 - R500 000 per year

We are looking for a skilled Site Engineer to join our team in Cape Town. The Site Engineer will be responsible for overseeing and managing all construction activities on site, ensuring that the project is completed on time, within budget, and to the highest standard of quality.Key Responsibilities:- Oversee and manage all construction activities on site,...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer