job description
• aws data engineer with min of 5 to 7 years of experience.
• collaborate with business analysts to understand and gather requirements for existing or new etl pipelines.
• connect with stakeholders daily to discuss project progress and updates.
• work within an agile process to deliver projects in a timely and efficient manner.
• design and develop airflow dags to schedule and manage etl workflows.
• transform sql queries into spark sql code for etl pipelines.
• develop custom python functions to handle data quality and validation.
• write pyspark scripts to process data and perform transformations.
• perform data validation and ensure data accuracy and completeness by creating automated tests and implementing data validation processes.
• run spark jobs on aws emr cluster using airflow dags.
• monitor and troubleshoot etl pipelines to ensure smooth operation.
• implement best practices for data engineering, including data modeling, data warehousing, and data pipeline architecture.
• collaborate with other members of the data engineering team to improve processes and implement new technologies.
• stay up to date with emerging trends and technologies in data engineering and suggest ways to improve the team's efficiency and effectiveness.