We are looking for a Machine Learning (ML) Data engineer who will partner with application teams and assist with data analysis and research; conducting tactical data extracts and build balanced ML and Data pipelines; deploy AI/ML models and build reports to measure deployed model efficiency.
Key Responsibilities:
€¢ Understanding business objectives and developing models that help to achieve them, along with metrics to track their progress
€¢ Analyzing the ML algorithms that could be used to solve a given problem and ranking them by their success probability
€¢ Exploring and visualizing data to gain an understanding of it, then identifying differences in data distribution that could affect performance when deploying the model in the real world
€¢ Verifying data quality, and/or ensuring it via data cleaning
€¢ Supervising the data acquisition process if more data is needed
€¢ Finding available datasets online that could be used for training
€¢ Defining validation strategies
€¢ Defining the preprocessing or feature engineering to be done on a given dataset
€¢ Defining data augmentation pipelines
€¢ Training models and tuning their hyperparameters
€¢ Analyzing the errors of the model and designing strategies to overcome them
€¢ Deploying models to production
Required
€¢ Hands-on experience in Data Warehouse, ETL, Data Modeling & Reporting.
€¢ 7+ years of hands-on experience in productizing and deploying Big Data platforms and applications, Hands-on experience working with: Relational/SQL, distributed columnar data stores/NoSQL databases, time-series databases, Spark streaming, Kafka, Hive, Redshift and more
€¢ Familiarity with data pipelines and ML pipelines right from Data Extraction to Insights generation
€¢ Highly skilled in SQL, Python, Spark, AWS S3, Hive Data Catalog, Parquet, Redshift, Airflow, and Tableau or similar tools.
€¢ Proven experience in building a Custom Enterprise Data Warehouse or implementing tools like Data Catalogs, Spark, Tableau, Kubernetes, and Docker
€¢ D