Data Engineer (pyspark)

Dubai, United Arab Emirates

https://www.mncjobsgulf.com/company/virtusa

Apply Now

Job Description

Job Title: Data Engineer (PySpark)
Responsibilities
xc2xb7 Data Pipeline Development: Design, develop, and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform, ensuring data integrity and accuracy.
xc2xb7 Data Ingestion: Implement and manage data ingestion processes from a variety of sources (e.g., relational databases, APIs, file systems) to the data lake or data warehouse on CDP.
xc2xb7 Data Transformation and Processing: Use PySpark to process, cleanse, and transform large datasets into meaningful formats that support analytical needs and business requirements.
xc2xb7 Performance Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL processes.
xc2xb7 Data Quality and Validation: Implement data quality checks, monitoring, and validation routines to ensure data accuracy and reliability throughout the pipeline.
xc2xb7 Automation and Orchestration: Automate data workflows using tools like Apache Oozie, Airflow, or similar orchestration tools within the Cloudera ecosystem.
xc2xb7 Monitoring and Maintenance: Monitor pipeline performance, troubleshoot issues, and perform routine maintenance on the Cloudera Data Platform and associated data processes.
xc2xb7 Collaboration: Work closely with other data engineers, analysts, product managers, and other stakeholders to understand data requirements and support various data-driven initiatives.
xc2xb7 Documentation: Maintain thorough documentation of data engineering processes, code, and pipeline configurations.
Qualifications
Education and Experience
xc2xb7 Bachelors or Masters degree in Computer Science, Data Engineering, Information Systems, or a related field.
xc2xb7 3+ years of experience as a Data Engineer, with a strong focus on PySpark and the Cloudera Data Platform.
Technical Skills
xc2xb7 PySpark: Advanced proficiency in PySpark, including working with RDDs, DataFrames, and optimization techniques.
xc2xb7 Cloudera Data Platform: Strong experience with Cloudera Data Platform (CDP) components, including Cloudera Manager, Hive, Impala, HDFS, and HBase.
xc2xb7 Data Warehousing: Knowledge of data warehousing concepts, ETL best practices, and experience with SQL-based tools (e.g., Hive, Impala).
xc2xb7 Big Data Technologies: Familiarity with Hadoop, Kafka, and other distributed computing tools.
xc2xb7 Orchestration and Scheduling: Experience with Apache Oozie, Airflow, or similar orchestration frameworks.
xc2xb7 Scripting and Automation: Strong scripting skills in Linux.
Soft Skills
xc2xb7 Strong analytical and problem-solving skills.
xc2xb7 Excellent verbal and written communication abilities.
xc2xb7 Ability to work independently and collaboratively in a team environment.
Attention to detail and commitment to data quality.

Beware of fraud agents! do not pay money to get a job

MNCJobsGulf.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

Data Engineer (Pyspark)

Virtusa

Dubai

Apply Now
Data Engineer (Pyspark)

Virtusa

Dubai

Apply Now

Data Engineer (Pyspark)

Virtusa

Dubai

Apply Now
Data Engineer (Pyspark)

Virtusa

Dubai

Apply Now

Job Detail

Job Id

JD1843281
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

Dubai, United Arab Emirates
Education

Not mentioned

MNC Jobs Gulf

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers

Data Engineer (pyspark)

Job Description

Related Jobs

Data Engineer (Pyspark)

Data Engineer (Pyspark)

Data Engineer (Pyspark)

Data Engineer (Pyspark)

Job Detail

Apply For This Job

Report this Employer

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers