Senior Platform Engineer
4 weeks ago
Responsibilities Infrastructure Design and Management Design, deploy, and maintain cloud infrastructure using platforms like Microsoft Azure, AWS, or Google Cloud. Ensure high availability, scalability, and fault tolerance of applications and services, including managing containerized environments. Utilize Rancher for managing Kubernetes clusters, ensuring efficient deployment and orchestration of containerized applications across various environments. Automation and CI/CD Pipeline Development Build and maintain continuous integration/continuous deployment (CI/CD) pipelines for automated testing, building, and deployment of software. Automate infrastructure provisioning using tools like Terraform, Ansible, or ARM templates. Leverage Ranchers CI/CD capabilities and integrations to streamline the deployment process for containerized applications in Kubernetes environments. Monitoring and Performance Optimization Implement and maintain monitoring, logging, and alerting solutions to track application and infrastructure performance, using tools such as Azure Monitor, Prometheus, or CloudWatch. Use Ranchers built-in monitoring tools to observe Kubernetes clusters and containers, ensuring that applications are performing optimally. Optimize infrastructure for cost-efficiency and performance, configuring autoscaling and resource management within Rancher-managed Kubernetes clusters. Security and Compliance Ensure the security of cloud infrastructure by configuring firewalls, access controls, and encryption for sensitive data. Implement security best practices to maintain compliance with industry regulations and standards, including role-based access control (RBAC) within Rancher for managing Kubernetes security. Monitor for security vulnerabilities, manage container security, and perform regular security audits of Kubernetes clusters and cloud resources. Collaboration with Development and Operations Teams Work closely with software development teams to understand application requirements and provide the necessary infrastructure support, particularly for containerized workloads. Collaborate with operations teams to ensure the smooth operation of deployed services, particularly within containerized environments managed through Rancher. Incident Management and Troubleshooting Investigate and resolve platform-related issues, including application outages, network failures, and security incidents. Utilize Ranchers centralized logging and monitoring to quickly identify and troubleshoot issues within Kubernetes clusters. Provide on-call support and contribute to incident response strategies, ensuring minimal downtime and fast recovery of services. System Upgrades and Patching Manage platform updates, patches, and upgrades to ensure systems remain secure and up-to-date. Plan and execute Kubernetes cluster upgrades and Rancher version updates to stay current with new features and security patches. Ensure that containerized applications remain compatible and functional after updates. Documentation and Knowledge Sharing Maintain clear, comprehensive documentation of infrastructure configurations, deployment processes, and troubleshooting procedures. Share knowledge of Rancher, Kubernetes, and cloud infrastructure best practices with team members to improve platform operations and efficiency. Capacity Planning and Scaling Monitor resource usage and plan for capacity scaling to meet changing business and application demands. Implement scaling strategies for Kubernetes clusters in Rancher, including auto-scaling of pods, nodes, and applications to accommodate varying workloads. Cost Management and Optimization Track and analyze cloud resource usage and costs to ensure efficient resource allocation. Optimize cloud spending by implementing best practices like reserved instances, spot instances, and resource rightsizing. Use Rancher to monitor the resource consumption of containerized applications and optimize the deployment of Kubernetes clusters to reduce infrastructure costs. Disaster Recovery and Backup Planning Implement disaster recovery strategies and data backup solutions to minimize downtime and data loss. Regularly test backup systems and recovery procedures to ensure reliability in case of failure, including implementing backup solutions for Kubernetes environments managed through Rancher. Essentia l skills Several years (typically 3-5 years) of experience in a related field (e.g., systems engineering, DevOps, infrastructure engineering). Bachelor's degree in Computer Science or related field, and or certifications such as Microsoft Certified: Azure Solutions Architect Expert, AWS Certified Solutions Architect, Certified Kubernetes Administrator (CKA), or Red Hat Certified Engineer (RHCE). Strong verbal and written communication skills, with the ability to convey complex ideas clearly and effectively Experience working collaboratively in cross-functional teams, with a focus on achieving shared goals Expertise in managing multiple projects simultaneously, with a track record of delivering on time and within scope Exceptional attention to detail, ensuring high standards of quality in all outputs Ability to adapt quickly to changing environments and priorities, maintaining effectiveness in dynamic situations Skills in designing highly available and fault-tolerant systems, ensuring platforms are resilient under various conditions. Proven working experience with tools like Prometheus, Grafana, Datadog, New Relic, or ELK stack to monitor the health of infrastructure, applications, and services. Excellent skills in identifying, diagnosing, and resolving infrastructure issues quickly, especially when systems fail or behave unexpectedly. Knowledge of securing infrastructure and applications, including role-based access control (RBAC), encryption, and network security. A solid understanding of Git for source code management, collaboration, and version control, is essential A strong understanding of container orchestration with Kubernetes, particularly in deploying, managing, and scaling containerized applications across multi-cluster environments. Proficiency in configuring and maintaining Rancher for cluster management, along with expertise in implementing security policies, monitoring, and logging within Kubernetes clusters, is essential for optimizing containerized workloads and ensuring high availability, security, and performance. A deep understanding of cloud infrastructure management, such as provisioning and configuring virtual machines, networking, storage solutions, and implementing security best practices like Azure Active Directory and network security groups. Additionally, proficiency in automation and CI/CD pipelines using Azure DevOps, along with expertise in Azure monitoring tools like Azure Monitor and Application Insights, is crucial for ensuring high availability, security, cost optimization, and efficient deployment of applications in the cloud. Strong focus on automating manual processes and optimizing workflows for more efficient system management Experience with designing distributed systems, microservices, and understanding the trade-offs between performance, consistency, and scalability. Please call us on
-
Senior Structural Engineer
3 weeks ago
Cape Town, South Africa Gig Engineer Full timeAbout the Role We are looking to recruit a Senior Professional Structural Engineer to fulfil the role of Technical Lead in MCF & Industry in South Africa teams, which are looking to recruit to expand and develop their South Africa‑based Structural Engineering team to assist with our growing workload. Requirements Desired Qualifications and Professional...
-
Senior Platform Engineer
2 weeks ago
Cape Town, South Africa Mind Detect Full timeOverview Unique opportunity for a Senior Platform Engineer in South Africa to build and enhance developer tooling, deployment processes and pipelines in order to ensure our elite
-
Platform Engineer
3 weeks ago
Cape Town, South Africa Eqplus Technologies Pty Ltd Full timeOverview An opportunity exists for a Platform Engineer to contribute to the development, integration, and operation of shared platform services supporting large-scale scientific computing and complex software systems. Working within the Site Reliability Engineering (SRE) team, this role will focus on automation, observability, and operational readiness as...
-
Senior Platform Engineer
1 day ago
Cape Town, Western Cape, South Africa Tillo Full time R80 000 - R150 000 per yearWho we're looking for: A Senior Platform Engineer with AWS & Kubernetes experience, and a strong curiosity to learn new tools The challenge: You will make significant contributions to the Tillo platform - designing and building infrastructure, systems and processes that empower our development teams to deploy and manage their applications with efficiency and...
-
Senior Staff Engineer
3 weeks ago
Cape Town, South Africa Coachhub - The Digital Coaching Platform Full timeAbout the Role CoachHub is on a mission to democratise coaching for all career levels worldwide. We need an amazing team to achieve this, so we’re bringing together kind, smart and highly-skilled people from all corners of the globe. If you’d like to shape the success story of a fast-growing, award-winning company and the leading global digital coaching...
-
Cape Town, South Africa Gig Engineer Full timeA leading engineering consultancy in Cape Town seeks a Senior Professional Structural Engineer to provide technical leadership in the Structural Engineering team. The candidate must have a minimum of 15 years of experience and be registered with ECSA as a PrEng. Responsibilities include managing projects, mentoring junior engineers, and ensuring compliance...
-
Senior Platform Engineer
7 days ago
Cape Town, Western Cape, South Africa Tillo Full time R900 000 - R1 200 000 per yearWho we're looking for: A Senior Platform Engineer with AWS & Kubernetes experience, and a strong curiosity to learn new tools The challenge: You will make significant contributions to the Tillo platform - designing and building infrastructure, systems and processes that empower our development teams to deploy and manage their applications with efficiency...
-
Software Engineering Team Lead
2 weeks ago
Cape Town, South Africa Root Platform Full timeRoot is a fast-growing tech startup and we’re on a mission to build the future of insurance. We're looking for a Senior Back-End Software Engineer with Team Lead experience to join our Engineering team. In this role as a Team Lead, you would manage both your personal contribution and that of your team. You would be leading one of Root’s engineering pods,...
-
Senior Platform And Compute Engineer
2 weeks ago
Cape Town, South Africa Boardroom Appointments Full timeSenior Platform and Compute Engineer - 12 Month Contract Role Responsibilities Maintain service availability for customers by performing routine maintenance Conduct failover testing Implement best practices Proactively monitor alerts and status changes Perform project‑related duties for new and changing infrastructure deployments as required Regularly...
-
Senior Platform Engineer
3 weeks ago
Cape Town, South Africa DigiOutsource Full timeJoin to apply for the Senior Platform Engineer role at DigiOutsource Kick-start your career in the online gaming world and experience the very latest in technology and innovation. Who We Are We’re part of Super Group, the NYSE-listed digital gaming company behind some of the world’s leading Sports and iGaming brands. At DigiOutsource, we bring passionate...