technologies / skills:
advanced sql, python and associated libraries like pandas, numpy etc., pyspark , shell scripting, data- modelling, big data, hadoop, hive, etl pipelines and iac tools like terraform etc.
responsibilities:
• efficient communication skills to coordinate with users, technical teams and datasolution architects.
• document technical design documents for given requirements or jira stories.
• communicate results and business impacts of insight initiatives to key stakeholders to collaboratively solve business problems.
• working closely with the overall enterprise data & analytics architect and engineering practice leads to ensure adherence with the best practices and design principles.
• assures quality, security and compliance requirements are met for supported area.
• develop fault-tolerance data pipelines running on cluster
• ability to come up with scalable and modular solutions
required qualification:
• 1-8 yrs of hands-on experience developing data pipelines for data ingestion or transformation using python (pyspark) /spark sql in aws cloud
• experience in development of data pipelines and processing of data at scale using technologies like emr, lambda, glue, athena, redshift, step functions.
• advanced experience in writing and optimizing efficient sql queries with python and hive handling large data sets in big-data environments
• experience in debugging, tunning and optimizing pyspark data pipelines
• should have implemented concepts and have good knowledge of pyspark data frames, joins, partitioning, parallelism etc.
• understanding of spark ui, event timelines, dag, spark config parameters, in order to tune the long running data pipelines.
• experience working in agile implementations
• experience with git and ci/cd pipelines to deploy cloud applications
• good knowledge of designing hive tables with partitioning for performance
thanks and regards
hr team