Site Reliability Engineer
1 month ago
Electrum is the next-generation payments technology company that provides cloud-native software to optimise the processing of financial transactions. Since 2012, we have established ourselves as a respected payments technology partner through our deep expertise and track record in delivering trusted enterprise-grade payments solutions.
We’ve built a reputation in providing solutions for high-volume, low-value payment schemes and services that enable our clients to deliver to their customers at scale. We love that the projects we work on touch the lives of millions of South Africans daily, making a real difference.
We hire the best of the best and we offer great opportunities for personal growth and career progression.
Site Reliability Engineers (SREs) are responsible for monitoring, automating, and improving the reliability, scalability, performance and availability of our services. SREs work on tasks such as preventing incidents, managing infrastructure reliability, building effective monitoring systems and ensuring smooth operations of cloud production systems.
Requirements
Service Reliability and Availability
- Collaborate with teams to develop reliable, available, and scalable applications.
- Work closely with the development team to understand, address, and prevent technical issues.
- Participate in on-call rotations and manage critical incidents.
- Develop and maintain incident response processes and alerting mechanisms.
- Develop and maintain tools to monitor application and service SLIs and SLOs.
System Troubleshooting and Problem Resolution
- Diagnose and resolve infrastructure and system-level issues, ensuring minimal downtime and swift problem resolution.
- Respond to and investigate incidents related to infrastructure and applications, utilising diagnostic tools to track down and remediate issues.
- Participate in on-call rotations to provide 24/7 operational support as necessary.
Observability and Automation
- Utilise technologies to develop and maintain effective log management and monitoring solutions for internal and external customers.
- Evaluate system health, identify performance bottlenecks and proactively optimise performance and cost-effectiveness.
- Implement automation tools and frameworks for deployment, configuration, and monitoring processes.
- Capacity management and planning for systems to ensure continued reliability.
Process Improvements
- Offer recommendations and improvements to enhance performance, security, and scalability.
- Evaluate and integrate emerging technologies, cloud services and automation tools to improve operational efficiency.
- Drive cost-optimization initiatives by identifying opportunities for resource right-sizing, efficiency and other cost-saving measures.
Disaster Recovery
- Design and implement disaster recovery strategies, including backup and restoration processes, to ensure business continuity.
- Develop and update incident management procedures, ensuring effective incident response by providing technical solutions and implementing preventative measures.
- Regularly assess system performance, identify irregularities, troubleshoot issues, and ensure high system availability. This includes performing or facilitating Disaster Recovery tests.
Requirements
- Bachelor's degree in Computer Science, Information Technology, or related field preferred.
- 3+ years experience in an SRE or similar role.
- Familiarity with AWS services like EC2, S3, RDS, Lambda, EKS and CloudWatch.
- Demonstrable experience with observability tools like Elastic and Grafana.
- Development skills advantageous.
- Proficient troubleshooting and problem-solving skills.
- Excellent collaboration, communication, and time management skills.
- Attention to detail and ability to work effectively in a team environment.
Benefits
A good work-life balance is very important at Electrum. To help you manage your own time and energy, Electrum offers benefits such as:
- Flexibility around core working hours (nature of flexibility is negotiated per role based on business need)
- Daily cooked lunches and a stocked kitchen for the mid-day nibbles
- Team socialising, getaways, and social outings
We have created a safe, transparent environment where we know mistakes happen, and that’s okay. We even have a 3 step approach to dealing with them:
1. Tell everyone about it
2. Fix the mistake
3. Tell everyone about it
You are responsible for your actions – both the successes and the failures.
-
Site Reliability Engineer
1 month ago
Cape Town, South Africa Lesaka Technologies Full timeKazang – Micro Merchant Division Senior Site Reliability Engineer A vacancy exists for a Senior SRE within the Kazang - Micro Merchant Division, in Cape Town, South Africa (Hybrid). We are seeking a Site Reliability Engineer (SRE) with expertise in Linux-based, open-source environments to ensure the reliability, scalability, and performance of our systems....
-
Site Reliability Engineer
3 weeks ago
Cape Town, South Africa Communicate Recruitment Full timeJob Experience & Skills Required:Experience: At least 5 years in Site Reliability Engineering, DevOps, or similar fieldsCloud Expertise: Deep knowledge of cloud platforms like OCI, AWS, GCP, or AzureAutomation Skills: Proficiency in scripting (Python, Bash) and tools like Terraform or AnsibleMonitoring Tools: Mastery of systems like Prometheus, Grafana, or...
-
Site Reliability Engineer
3 weeks ago
Cape Town, South Africa Communicate Recruitment Full timeJob Experience & Skills Required: Experience: At least 5 years in Site Reliability Engineering, DevOps, or similar fieldsCloud Expertise: Deep knowledge of cloud platforms like OCI, AWS, GCP, or AzureAutomation Skills: Proficiency in scripting (Python, Bash) and tools like Terraform or AnsibleMonitoring Tools: Mastery of systems like Prometheus, Grafana, or...
-
Principal Site Reliability Engineer
2 months ago
Cape Town, Western Cape, South Africa MSP Staffing LTD Full timeSenior Principal Site Reliability EngineerWe are seeking an experienced Senior Principal Site Reliability Engineer to join our team in a fully remote role. This is a unique opportunity for someone who is passionate about optimizing systems and driving large-scale reliability.Key Requirements5 years of experience overall.BSc/ BTech/ N.Dip degree.Azure DevOps...
-
Senior Site Reliability Engineer
2 weeks ago
Cape Town, South Africa Lulalend Full timeALL STAFF APPOINTMENTS WILL BE MADE WITH DUE CONSIDERATION OF THE COMPANY’S EE TARGETS Job title: Senior Site Reliability Engineer (Senior Azure Cloud Engineer) Reporting to: Site Reliability Team Lead Location: Cape Town WHAT WE DO We're Lula. We build innovative fintech products to help SMEs make cash flow. From instant access to funding to...
-
Site Reliability Engineer
1 month ago
Cape Town, South Africa Progressive Edge Full timeSite Reliability Engineer (SRE) Remote / Cape Town (Must be based in SA) Are you passionate about solving complex problems, working with cutting-edge technologies, and thriving in a dynamic, collaborative environment? Join a team at the forefront of innovation in digital identity, payments, and fraud prevention. We develop backend technologies enabling...
-
Site Reliability Engineer
1 month ago
Cape Town, Western Cape, South Africa Plus1X Solutions (Pty) Ltd Full timeAbout the RolePlus1X Solutions (Pty) Ltd is seeking a skilled Site Reliability Engineer with expertise in Microsoft Azure to join our dynamic team.The ideal candidate will be responsible for receiving, logging, validating, and diagnosing client requests across our full spectrum of products and services, adhering to agreed service level agreements leveraging...
-
Senior Site Reliability Engineer
4 months ago
Cape Town, South Africa Lula Full timeALL STAFF APPOINTMENTS WILL BE MADE WITH DUE CONSIDERATION OF THE COMPANY'S EE TARGETS Job title: Senior Site Reliability Engineer (Senior Azure Cloud Engineer) Reporting to: Site Reliability Team Lead Location: Cape Town WHAT WE DO We're Lula. We build innovative fintech products to help SMEs make cash...
-
Site Reliability Engineer
4 weeks ago
Cape Town, South Africa Electrum Payments Full timeElectrum is the next-generation payments technology company that provides cloud-native software to optimise the processing of financial transactions. Since 2012, we have established ourselves as a respected payments technology partner through our deep expertise and track record in delivering trusted enterprise-grade payments solutions. We’ve built a...
-
Site Reliability Engineer
1 month ago
Cape Town, South Africa Electrum Payments Full timeElectrum is the next-generation payments technology company that provides cloud-native software to optimise the processing of financial transactions. Since 2012, we have established ourselves as a respected payments technology partner through our deep expertise and track record in delivering trusted enterprise-grade payments solutions.We’ve built a...
-
Principal Site Reliability Engineer
2 months ago
Western Cape, South Africa MSP Staffing LTD Full timeOur client is seeking an experienced Senior Principal Site Reliability Engineer to join their team in a fully remote role This is an incredible opportunity for someone who is passionate about optimizing systems and driving large-scale reliability. Key Requirements · 5 years experience overall. · BSc/ BTech/ N.Dip · Azure DevOps · IaC · Azure services ·...
-
SRE (Site Reliability Engineer)
3 months ago
Cape Town, South Africa Travellab Africa Group Full timeOur Travelstart team is seeking an SRE (Site Reliability Engineer) for our Dev Team. This role ensures the reliability, performance, and scalability of the Travelstart systems. This role bridges the gap between software development and system operations, focusing on automating infrastructure and processes to improve reliability and efficiency.(This...
-
Site Reliability Engineer
3 weeks ago
Cape Town City Centre, South Africa Jobted ZA C2 Full timeJob Experience & Skills Required: Experience: At least 5 years in Site Reliability Engineering, DevOps, or similar fields Cloud Expertise: Deep knowledge of cloud platforms like OCI, AWS, GCP, or Azure Automation Skills: Proficiency in scripting (Python, Bash) and tools like Terraform or Ansible Monitoring Tools: Mastery of systems like Prometheus, Grafana,...
-
Cape Town, Western Cape, South Africa Datafin Recruitment Full timeJob OverviewA site reliability engineer with 5+ years of experience in cloud services, scripting languages, and configuration management tools is sought to join Datafin Recruitment. The ideal candidate will have a strong understanding of networking concepts, containerization, and orchestration.
-
Reliability Engineer
6 months ago
Cape Town, South Africa Kimberly-Clark Full timeDescription Role: Reliability Engineer Location: Epping Mill What is it like to work at Kimberly-Clark? Huggies®. Kleenex®. Andrex®. Scott®. Kotex®. Poise®. Depend®. You already know our legendary brands—and so does the rest of the world. In fact, 25% of people in the world use Kimberly-Clark products every day. The identity of...
-
Site Engineer
6 months ago
Cape Town, South Africa Armstrong Appointments Full timeExciting opportunity alert! Join a dynamic mid-sized construction firm in Cape Town as a Site Engineer and play a pivotal role in driving project success! Your mission: - Oversee and coordinate all aspects of construction project management- Ensure compliance with engineering specs, industry standards, and regulations- Foster seamless communication with...
-
Site Engineer
3 months ago
Cape Town, South Africa Armstrong Appointments Full timeExciting opportunity alert! Join a dynamic mid-sized construction firm in Cape Town as a Site Engineer and play a pivotal role in driving project success! Your mission: - Oversee and coordinate all aspects of construction project management- Ensure compliance with engineering specs, industry standards, and regulations- Foster seamless communication with...
-
Site EHS and Engineering Lead
4 months ago
Cape Town, South Africa Haleon Full timeHello. We’re Haleon. A new world-leading consumer health company. Shaped by all who join us. Together, we’re improving everyday health for billions of people. By growing and innovating our global portfolio of category-leading brands - including Sensodyne, Panadol, Advil, Voltaren, Theraflu, Otrivin, and Centrum - through a unique combination of deep...
-
Site EHS and Engineering Lead
7 months ago
Cape Town, South Africa Haleon Full timeHello. We’re Haleon. A new world-leading consumer health company. Shaped by all who join us. Together, we’re improving everyday health for billions of people. By growing and innovating our global portfolio of category-leading brands - including Sensodyne, Panadol, Advil, Voltaren, Theraflu, Otrivin, and Centrum - through a unique combination of deep...
-
Azure Site Reliability Team Lead
6 months ago
Cape Town, South Africa Lula Full time**WHAT WE DO** We're Lula. We build innovative fintech products to help SMEs make cash flow. From instant access to funding to all-in-one business banking accounts and cutting-edge financial analysis tools, we're on it! Our purpose is to help SMEs manage their business better, faster, simpler, Lula, so they can spend more time doing what they...