SRE Manager

4 weeks ago


Sandton, South Africa Dimension Data Full time
Functieomschrijving

A bit about the Team, you would be join Building 20. They are responsible for building, deploying and running all service platforms and services. Optimising system reliability, performance and security is our responsibility, and as such we focus heavily on observability, automation, scalable architecture and tooling.

The primary responsibility of the DevOps Manager is to collaborate with a variety of stakeholders to ensure available, scalable and reliable business experiences in the daily management of the DevOps area in the business. The role should drive increased capacity to innovate and expedite the value add to our business. This includes management of all design, build, release and support activities associated with the relevant portfolio. The role needs to take decisive management accountability and ownership of the DevOps portfolio of the entire scope of work. They are responsible to ensure the team deliver and execute in alignment with solid DevOps principles and practices.

Provides leadership for those involved in the development, design and optimization of one or more information technology and systems functions supporting company business processes and technical information systems platforms. Responsibilities include, but are not limited to, analysis, selection and modification of enterprise systems, application software, installation of network hardware/software and database management. Provides direction for the effort required to protect the company’s data, tools and information systems. Ensures infrastructure architecture standards maximize efficiency and support platform compatibility. Usually requires subject matter knowledge of user group for practical application of system characteristics. Coordinates delivery of services to user groups and ensures IT service is uninterrupted. Selects, develops, and evaluates personnel to ensure the efficient operation of the function.

Job Description

 

With an infrastructure as code (IaC) approach you will architect, design and help implement a highly automated cloud-based deployment and operational model.

Be a strong advocate for a SRE culture with our Development, Operations and Engineering teams.

Have an automation first approach to application and infrastructure deployment, setup and configuration.

Continue to deepen your scripting, coding and system administration skills and apply them to reduce time to market for services.

Assist with the identification and investigation of emerging technologies and associated migration paths to provide cost effective and scalable solutions.

Assess and document the impacts, threats and opportunities to the organisation.

Create reports and technology roadmaps and shares knowledge and insights with others.

Provide advice, guidance and expertise to promote adoption of methods and tools and adherence to policies and standards.

Evaluate and select appropriate methods and tools in line with agreed policies and standards.

Manage reviews of the benefits and value of methods and tools and identities and recommends improvements.

Contribute to organisational policies, standards, and guidelines for methods and tools.

Deals with problems and issues, managing resolutions, corrective actions, lessons learned and the collection and dissemination of relevant information.

Manage operations for the DevOps Team and coordinating the efforts of product design and development with the more business-oriented operations and production to achieve successful new product launches

Work closely with a variety of internal stakeholders to ensure the execution of the DevOps strategy.

Design, develop and implement operational capabilities by overseeing the design, development, and implementation of processes, capabilities, tools and processes.

Oversee the product/portfolio lifecycle and ensure the development and operation of an agile software development process all the way through to production.

Ensure the collaborative development and continuous integration and the early discovery and exposure of integration risks, reduced costs and shortened testing cycles.

Test and deploy processes to ensure the creation of processes that are iterative, frequent, repeatable and reliable.

Monitor and validate operational quality and ensures early monitoring of the functional and non-functional characteristics by enabling automated testing.

Manage the outcomes of the tickets opened to ensure the uptime and quality of the production system is always maintained.

Provides metrics which allows the organisation to be more agile and responsive and to take appropriate action to improve and enhance the customer experience.

Manages individuals and groups. Allocates responsibilities and/or packages of work, including supervisory responsibilities.

Plans and leads the identification and assessment of new and emerging technologies and the evaluation of the potential impacts, threats and opportunities.

Selects, adopts and adapts appropriate software design methods, tools and techniques, selecting appropriately from predictive (plan-driven) approaches or adaptive (iterative/agile) approaches.

Coordinates and manages planning of the system and/or acceptance tests, including software security testing, within a development or integration project or program.

Ensures that all requests for support are dealt with according to set standards and procedures.

Requirements

Experience

Substantial years of team management experience

Substantial years of experience deploying software solutions to clients in an outsourced or similar IT environment.

Substantial experience working in a multi-team environment across multiple geographies.

Skills and Attributes: 

Strong working knowledge of Amazon Web Services / Microsoft Azure

Related experience working with core AWS services such as EC2, S3, RDS, VPC, DynamoDB, Systems Manager, Lambda, IAM, CloudFormation, API Gateway, CloudWatch.

Good working knowledge of infrastructure design, including network, storage and compute layers.

Knowledge and experience across the Software Development Lifecycle (SDLC).

In-depth experience with at least one automated deployment tool, Chef, Salt and Jenkins experience highly regarded.

Strong working knowledge with Infrastructure as code technologies such as CloudFormation / Terraform

Expert level PowerShell / Python / Bash scripting experience.

Extensive experience with Microsoft Server and Linux platforms with a focus on automation and orchestration.

Working knowledge of relational databases and NoSQL databases

Experience with source control and build systems, GitLab CI/CD and Azure DevOps experience highly regarded.

Experience evaluating and integrating new technologies and products.

Proven experience with Hyper Scaler Cloud IaaS, and PaaS. Azure and AWS experience highly regarded.

Strong people management capability

Deep technical understanding of development and systems / platform engineering

Customer centricity

Strong collaboration capabilities

Highly focused on outcomes

Highly organized and planned individual

Self-starter and self-managed

Strategic ability to define technical direction in the portfolio of products including technical strategies and plans and execution plans

Strong stakeholder and relationship management

Excellent communication skills



  • Sandton, South Africa Experian Full time

    Job DescriptionWhy this role is critical to usAs part of the next phase in our growth, we are looking to expand our Site Reliability Engineering team to offer round the global cover. As an organisation we are fully convinced that everything should be automated and that software should run software and believe in the Site Reliability Engineering model. We...


  • Sandton, South Africa Experian Full time

    Job Description Why this role is critical to us As part of the next phase in our growth, we are looking to expand our Site Reliability Engineering team to offer round the global cover. As an organisation we are fully convinced that everything should be automated and that software should run software and believe in the Site Reliability Engineering...