Senior Data Engineer, Data Lake, Big Data, Apache NiFi \xe2\x80\x93 Dubai, 6-month contract
You must be a Senior Data Engineer, currently on a Freelance Visa in Dubai. You will be joining a growing Security company, heavily involved in AI and Big Data products that are sold to public and private sector clients.
This is a 6-month contract ONLY, with the possibility of extension at a later date.
Working hours for this role are:
Monday to Thursday, 7:30am \xe2\x80\x93 3:30pm
Friday, 7am \xe2\x80\x93 1pm
Alternative Saturday\xe2\x80\x99s, 7:30am \xe2\x80\x93 12:30pm
They work on a \xe2\x80\x9cone Saturday on, two Saturday\xe2\x80\x99s off\xe2\x80\x9d model, so any candidate would only have to work two Saturday\xe2\x80\x99s a month.
Responsibilities:
Solid background on software development with strong python coding skills and solve challenging problems.
Developing Data pipelines with Cloud Services & On-premise Data Centers.
Web crawling, data cleaning, data annotation, data ingestion and data processing.
Reading and collating complex data sets.
Creating and maintaining data pipelines.
Continual focus on process improvement to drive efficiency and productivity within the team.
Use of Python, SQL, ES, Shell etc. to build the infrastructure required for optimal extraction, transformation, and loading of data.
Provide insights into key business performance metrics by building analytical tools that utilize the data pipeline.
Support the wider business with their data needs on an ad hoc basis.
Open to extensive international business travel as and when required, and for extended periods.
Qualifications:
6+ years of programming experience, solid coding skills in Python, Shell, and Java.
Bachelor\xe2\x80\x99s degree in computer engineering, Computer Science, or Electrical Engineering and Computer Sciences.
Strong practical knowledge in data processing and migration tools, such as Apache NiFi, Kafka, and Spark.
Design, build, and maintain data processing with CDP (Cloudera Data Platform) Private Cloud.
Develop and Maintain Data Workflow with Apache Airflow.
Experience with HDFS or Similar Object Storage
Strong Understanding about Distribute Computing and Distributed Systems
Experience with Web crawling, cleaning.
Experience with solution architecture, data ingestion, query optimization, data segregation, ETL, ELT, AWS, EC2, S3, SQS, lambda, Elastic Search, Redshift, CI/CD frameworks and workflows.
Working knowledge of data platform concepts \xe2\x80\x93 data lake, data warehouse, ETL, big data processing (designing and supporting variety/velocity/volume), real time processing architecture for data platforms, scheduling and monitoring of ETL/ELT jobs
PostgreSQL and programming (preferably Java, Python), proficiency in understanding data, entity relationships, structured & unstructured data, SQL and NoSQL databases.
Knowledge of best practice in optimizing columnar and distributed data processing system and infrastructure.
Experienced in designing and implementing dimensional modelling.
Knowledge of machine learning and data mining techniques in one or more areas of statistical modelling, text mining and information retrieval.