. The ideal candidate will be responsible for designing, implementing, and maintaining scalable data workflows to support multiple data sources, while ensuring seamless integration with
Power BI
for reconciliation, reporting, and advanced analytics.
Key Responsibilities
Data Pipeline Development
o Design, develop, and maintain scalable
ETL/ELT pipelines
using
PySpark
.
o Implement data ingestion from multiple data sources (Finance, CRM, Payment Gateways, etc.) into AWS Data Lake/Warehouse.
o Ensure efficient
data partitioning, schema management, and optimization
for large datasets.
Cloud & Infrastructure (AWS)
o Leverage
AWS services
(S3, Glue, EMR, Redshift, Lambda, Step Functions) for data engineering workflows.
o Implement
data security, access control, and compliance
in line with organizational standards.
o Automate deployments using
CI/CD pipelines
.
Data Integration & Reporting
o Prepare curated datasets for
Power BI dashboards
and reports.
o Enable reconciliation processes by providing granular, auditable datasets.
o Collaborate with BI developers to ensure data accuracy, integrity, and performance in reporting.
Collaboration & Process
o Work closely with
Business Analysts, Data Scientists, and ERP functional teams
to understand requirements.
o Participate in
data governance initiatives
, ensuring metadata, lineage, and quality controls are implemented.