We are seeking an expert with deep proficiency as a DataBricks Platform Engineer, possessing experience in data engineering. This individual should have a comprehensive understanding of both data platforms and software engineering, enabling them to integrate the platform effectively within an IT ecosystem.
Responsibilities
Manage and optimize Databricks data platform.
Ensure high availability, security, and performance of data systems.
Provide valuable insights about data platform usage.
Optimize computing and storage for large-scale data processing.
Design and maintain system libraries (Python) used in ETL pipelines and platform governance.
Optimize ETL Processes
Enhance and tune existing ETL processes for better performance, scalability, and reliability.
Skills
Must have
Minimum 10 Years of experience in IT/Data.
Minimum 3 years of experience as a Databricks Data Platform Engineer.
Bachelor's in IT or related field.
Infrastructure & Cloud: Azure, AWS (expertise in storage, networking, compute).
Programming: Proficiency in PySpark for distributed computing.
Proficiency in Python for ETL development.
SQL: Expertise in writing and optimizing SQL queries, preferably with experience in databases such as PostgreSQL, MySQL, Oracle, or Snowflake.
Data Warehousing: Experience working with data warehousing concepts and Databricks platform.
ETL Tools: Familiarity with ETL tools & processes
Data Modelling: Experience with dimensional modelling, normalization/denormalization, and schema design.
Version Control: Proficiency with version control tools like Git to manage codebases and collaborate on development.
Data Pipeline Monitoring: Familiarity with monitoring tools (e.g., Prometheus, Grafana, or custom monitoring scripts) to track pipeline performance.
Data Quality Tools: Experience implementing data validation, cleaning, and quality frameworks, ideally Monte Carlo.