Data Engineer

3 days ago


Johannesburg, South Africa Recru-it Full time
Data Architecture and Design
- Data Modeling:
o Create normalized and denormalized schemas (3NF, star, snowflake).
o Design data lakes, warehouses, and marts optimized for analytical or transactional workloads.
o Incorporate modern paradigms like data mesh, lakehouse, and delta architecture.
- ETL/ELT Pipelines:
o Develop end-to-end pipelines for extracting, transforming, and loading data.
o Optimize pipelines for real-time and batch processing.
- Metadata Management:
o Implement data lineage, cataloging, and tagging for better discoverability and governance.

Distributed Computing and Big Data Technologies
- Proficiency with big data platforms:
o Apache Spark (PySpark, Sparklyr).
o Hadoop ecosystem (HDFS, Hive, MapReduce).
o Apache Iceberg or Delta Lake for versioned data lake storage.
- Manage large-scale, distributed datasets efficiently.
- Utilize query engines like Presto, Trino, or Dremio for federated data access.

Data Storage Systems
- Expertise in working with different types of storage systems:
o Relational Databases (RDBMS): SQL Server, PostgreSQL, MySQL, etc.
o NoSQL Databases: MongoDB, Cassandra, DynamoDB.
o Cloud Data Warehouses: Snowflake, Google BigQuery, Azure Synapse, AWS Redshift.
o Object Storage: Amazon S3, Azure Blob Storage, Google Cloud Storage.
- Optimize storage strategies for cost and performance:
o Partitioning, bucketing, indexing, and compaction.

Programming and Scripting
- Advanced knowledge of programming languages:
o Python (pandas, PySpark, SQL Alchemy).
o SQL (window functions, CTEs, query optimization).
o R (data wrangling, Sparklyr for data processing).
o Java or Scala (for Spark and Hadoop customizations).
- Proficiency in scripting for automation (e.g., Bash, PowerShell).

Real-Time and Streaming Data
- Expertise in real-time data processing:
o Apache Kafka, Kinesis, Azure Event Hub for event streaming.
o Apache Flink or Spark Streaming for real-time ETL.
o Implement event-driven architectures using message queues.
- Handle time-series data and process live feeds for real-time analytics.

Cloud Platforms and Services
- Experience with cloud environments:
o AWS: Lambda, Glue, EMR, Redshift, S3, Athena.
o Azure: Data Factory, Synapse, Data Lake, Databricks.
o GCP: BigQuery, Dataflow, Dataproc.
- Manage infrastructure-as-code (IaC) using tools like Terraform or CloudFormation.
- Leverage cloud-native features like auto-scaling, serverless compute, and managed services.

DevOps and Automation
- Implement CI/CD pipelines for data workflows:
o Tools: Jenkins, GitHub Actions, GitLab CI, Azure DevOps.
- Monitor and automate tasks using orchestration tools:
o Apache Airflow, Prefect, Dagster.
o Managed services like AWS Step Functions or Azure Data Factory.
- Automate resource provisioning using tools like Kubernetes or Docker.

Data Governance, Security, and Compliance
- Data Governance:
o Implement role-based access control (RBAC) and attribute-based access control (ABAC).
o Maintain master data and metadata consistency.
- Security:
o Apply encryption at rest and in transit.
o Secure data pipelines with IAM roles, OAuth, or API keys.
o Implement network security (e.g., firewalls, VPCs).
- Compliance:
o Ensure adherence to regulations like GDPR, CCPA, HIPAA, or SOC 2.
o Track and document audit trails for data usage.

Performance Optimization
- Optimize query and pipeline performance:
o Query tuning (partition pruning, caching, broadcast joins).
o Reduce IO costs and bottlenecks with columnar formats like Parquet or ORC.
o Use distributed computing patterns to parallelize workloads.
- Implement incremental data processing to avoid full dataset reprocessing.

Advanced Data Integration
- Work with API-driven data integration:
o Consume and build REST/GraphQL APIs.
o Implement integrations with SaaS platforms (e.g., Salesforce, Twilio, Google Ads).
- Integrate disparate systems using ETL/ELT tools like:
o Informatica, Talend, dbt (data build tool), or Azure Data Factory.

Data Analytics and Machine Learning Integration
- Enable data science workflows by preparing data for ML:
o Feature engineering, data cleaning, and transformations.
- Integrate machine learning pipelines:
o Use Spark MLlib, TensorFlow, or scikit-learn in ETL pipelines.
- Automate scoring and prediction serving using ML models.

Monitoring and Observability
- Set up monitoring for data pipelines:
o Tools: Prometheus, Grafana, or ELK stack.
o Create alerts for SLA breaches or job failures.
- Track pipeline and job health with detailed logs and metrics.

Business and Communication Skills
- Translate complex technical concepts into business terms.
- Collaborate with stakeholders to define data requirements and SLAs.
- Design data systems that align with business goals and use cases.

Continuous Learning and Adaptability
- Stay updated with the latest trends and tools in data engineering:
o E.g., Data mesh architecture, Fabric, and AI-integrated data workflows.
- Actively engage in learning through online courses, certifications, and community contributions:
o Certifications like Databricks Certified Data Engineer, AWS Data Analytics Specialty, or Google Professional Data Engineer.

  • Johannesburg, South Africa Dimension Data Full time

    **Data Governance Manager** The primary responsibility of Data Governance Manager is to coordinate, design, develop, direct, and oversee the implementation of the Data Governance Framework and do analysis and presentation on various area to ensure that Governance is adhered to, and Data Quality is positively affected by this. This position will manage the...

  • Data Anayst

    3 weeks ago


    Johannesburg, South Africa Data Centrix Full time

    Qualifications / Knowledge / Experience:Bachelors degree in computer science, Information Systems and/ or;Advanced computer literacy.Financial services knowledge.SDLC methodology knowledge.Minimum 5 years experience in financial services sector.Minimum 5 years experience in IT.Key Responsibilities:Collecting, cleaning, and organizing data from various...

  • Data Anayst

    3 weeks ago


    Johannesburg, South Africa Data Centrix Full time

    Qualifications / Knowledge / Experience: Bachelors degree in computer science, Information Systems and/ or; Advanced computer literacy. Financial services knowledge. SDLC methodology knowledge. Minimum 5 years experience in financial services sector. Minimum 5 years experience in IT. Key Responsibilities:Collecting, cleaning, and organizing data from...

  • Data Scientist

    3 months ago


    Johannesburg, South Africa NTT DATA Full time

    **Make an impact with NTT DATA** Join a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence and leading innovations, and for making a difference to our clients and society. Our workplace embraces diversity and inclusion - it’s a place where you can grow, belong and thrive. **Your day at NTT DATA** The...


  • Johannesburg, South Africa Axis Data Full time

    Axis Data is a US-based advanced data engineering and analytics consulting firm, and a certified partner of Palantir. We help clients implement and support the most advanced data operations platform: Palantir Foundry Role Overview: As a Senior Business Data Analyst at Axis Data, you will lead the development and deployment of BI and data solutions,...


  • Johannesburg, Gauteng, South Africa Data Centrix Full time

    About Data CentrixData Centrix is a leading provider of data-driven solutions, dedicated to empowering businesses with actionable insights.Job SummaryWe are seeking an experienced Data Insights Strategist to join our team. As a key member of our analytics department, you will be responsible for gathering, structuring, and analyzing complex data from various...


  • Johannesburg, Gauteng, South Africa Data Centrix Full time

    At Data Centrix, we are seeking a highly skilled Data Solutions Architect to join our team. This is an exciting opportunity to work with cutting-edge technologies and contribute to the design of innovative data solutions.About the RoleWe are looking for a seasoned professional with expertise in data architecture, business analysis, and IT project management....


  • Johannesburg, South Africa Dimension Data Full time

    **Requirements**: - The Data Center Technician (DCT) responsibility is the maintenance of all Data Centre White Space, ensuring a 24/7 service offering within Internet Solutions Data Centers.The DCT provides a service to clients to ensure that their IT infrastructure and systems remain operational through proactively identifying, investigating and resolving...

  • Business Analyst

    3 weeks ago


    Johannesburg, South Africa Data Centrix Full time

    Qualifications / Knowledge / Experience:Bachelors degree in computer science, Information Systems and/ or;Diploma in Business Analysis from an IIBA endorsed education provider and/or;Valid CBAP or CCBA certification.Advanced computer literacy.Financial services knowledge.Advanced knowledge of the BABOK.SDLC methodology knowledge.Minimum 5 years experience in...

  • Business Analyst

    3 weeks ago


    Johannesburg, South Africa Data Centrix Full time

    Qualifications / Knowledge / Experience: Bachelors degree in computer science, Information Systems and/ or; Diploma in Business Analysis from an IIBA endorsed education provider and/or; Valid CBAP or CCBA certification. Advanced computer literacy. Financial services knowledge. Advanced knowledge of the BABOK. SDLC methodology knowledge. Minimum 5 years...

  • Data Engineer

    1 month ago


    Johannesburg, South Africa Jobted ZA C2 Full time

    A reputable company in the Financial Services sector is seeking a Data Engineer to work with cloud data systems. Proficiency in the Microsoft BI Stack, T-SQL development, and cloud services is essential. Join a dynamic team and enhance your expertise while working on cutting-edge systems and data solutions. As a Data Engineer, you will play a pivotal role in...

  • Data Engineer

    2 months ago


    Johannesburg, South Africa Jobted ZA C2 Full time

    As a Data Engineer, you will: - Design, develop, and implement robust data pipelines and ETL processes to meet client needs, focusing on data cleansing, aggregation, and enrichment. - Collaborate with clients to understand their requirements and translate them into scalable data solutions. - Aggregate, store, and manage data from various sources, ensuring...

  • Data Engineer

    1 month ago


    Johannesburg, South Africa Network Recruitment Full time

    A reputable company in the Financial Services sector is seeking a Data Engineer to work with cloud data systems. Proficiency in the Microsoft BI Stack, T-SQL development, and cloud services is essential. Join a dynamic team and enhance your expertise while working on cutting-edge systems and data solutions. As a Data Engineer, you will play a pivotal role in...


  • Johannesburg, Gauteng, South Africa Data Centrix Full time

    Job Title: Lead Software EngineerWe are seeking an experienced and skilled Lead Software Engineer to join our team at Data Centrix. About the Role:As a Lead Software Engineer, you will be responsible for researching, designing, implementing, and maintaining software programs and services.You will create and maintain database tables, functions, and stored...


  • Johannesburg, South Africa Dimension Data Full time

    As a **Networking Data Centre Engineer **you will be responsible for the management of all aspects of the company’s IT infrastructure and its users. You'll ensure that our systems are running smoothly, that our network is secure and reliable, and that we're always up to date with software patches. You'll also ensure your team has enough technical expertise...

  • Data Engineer

    2 months ago


    Johannesburg, South Africa Network Recruitment Full time

    As a Data Engineer, you will:Design, develop, and implement robust data pipelines and ETL processes to meet client needs, focusing on data cleansing, aggregation, and enrichment.Collaborate with clients to understand their requirements and translate them into scalable data solutions.Aggregate, store, and manage data from various sources, ensuring...

  • Data Engineer

    2 months ago


    Johannesburg, South Africa Network Recruitment Full time

    As a Data Engineer, you will:Design, develop, and implement robust data pipelines and ETL processes to meet client needs, focusing on data cleansing, aggregation, and enrichment. Collaborate with clients to understand their requirements and translate them into scalable data solutions. Aggregate, store, and manage data from various sources, ensuring...

  • Data Engineer

    2 months ago


    Johannesburg, South Africa Jobted ZA C2 Full time

    What you will be doing - The ideal candidate will have extensive experience in data engineering, particularly with Ab Initio, and will be responsible for designing, developing, and maintaining the data infrastructure. - Design, build, and maintain scalable data pipelines using Ab Initio. - Develop ETL processes to extract, transform, and load data from...


  • Johannesburg, South Africa Data Centrix Full time

    Qualifications / Knowledge / Experience:Matric plus Bachelors degree in computer science, Information Systems and/ or;Diploma in Business Analysis from an IIBA endorsed education provider and/or;Valid CBAP or CCBA certification.Advanced computer literacy.Financial services knowledge.Advanced knowledge of the BABOK.SDLC methodology knowledge.Minimum 5 years...


  • Johannesburg, South Africa Data Centrix Full time

    Qualifications / Knowledge / Experience: Matric plus Bachelors degree in computer science, Information Systems and/ or; Diploma in Business Analysis from an IIBA endorsed education provider and/or; Valid CBAP or CCBA certification. Advanced computer literacy. Financial services knowledge. Advanced knowledge of the BABOK. SDLC methodology knowledge....