Alcitius Consulting logo
Alcitius Consulting Careers

Alcitius Consulting

Alcitius Consulting

The ML Data Engineer is responsible for designing, implementing, and maintaining a centralized feature repository for scalable machine learning development. This includes building PySpark pipelines, maintaining feature lineage and metadata, ensuring governance and consistency across training and inference, and aligning MLOps architecture via Cloudera and Hopsworks integration.Key ResponsibilitiesOverall Responsibilities:• Design, build, and maintain robust data pipelines and centralized feature stores.• Enable consistent, reusable, and governed features for ML development and inference.• Collaborate with data scientists to transform raw data into model-ready features.• Ensure data validation, versioning, and lineage to support explainability and trust.• Streamline data workflows to reduce model development cycle time.• Contribute to feature documentation, reusability frameworks, and metadata tracking.• Support experimentation through scalable access to pre-processed and curated features.Technical Responsibilities:• Develop and orchestrate batch and streaming pipelines using Cloudera, Hadoop, Hive, and Spark.• Build and manage centralized Feature Stores to ensure training-serving consistency• Implement data validation checks using tools like Great Expectations or custom scripts.• Maintain feature lineage, version control, and data governance protocols.• Integrate feature engineering processes with MLFlow and experiment tracking tools.• Optimize feature pipelines for low latency and high throughput in real-time applications.• Work with Data Scientists to improve data quality, resolve inconsistencies, and enable faster experimentation.• Monitor feature drift, feature availability, and quality over time.Tools & Technologies:• Big Data & Storage: Cloudera, Hadoop, Hive, Spark, HDFS, Azure Data Lake• Feature Store: Feast, Hopsworks, or custom implementations• ETL Pipelines: PySpark, SQL, Airflow, Azure Pipelines• Validation & Quality: Great Expectations, PyDeequ• Versioning: DVC, Delta Lake• Experiment Tracking: MLFlow• Programming Languages: Python, SQL, PySpark• Governance & Compliance: Audit Logs, Access Control, Metadata TrackingPreferred Experience:• 7-8+ years of experience as a Data Engineer or ML Data Engineer.• Experience building and managing large-scale ETL workflows for ML use cases.• Hands-on exposure to building and using feature stores in production.• Strong knowledge of feature governance, versioning, and schema management.Education & Certifications:• Bachelor’s or Master’s degree in Data Engineering, Computer Science, or related discipline.Certifications preferred:Microsoft Azure Data Engineer AssociateCloudera Data Engineer CertificationDatabricks Data Engineer Associate

Posted 8 months ago

The MLOps Engineer is responsible for automating, operationalizing and managing the machine learning lifecycle across all phases—training, evaluation, deployment, and monitoring. The role includes building CI/CD pipelines for ML workloads, enabling continuous training and deployment via Azure DevOps, maintaining feature and model registries and enforcing ML governance.Key ResponsibilitiesOverall Responsibilities:• Design and implement end-to-end MLOps pipelines for ML model lifecycle management.• Collaborate with Data Scientists to streamline model experimentation and deployment workflows.• Ensure reproducibility, scalability, and automation of ML systems.• Maintain production-grade infrastructure with focus on availability, monitoring, and fault-tolerance.• Establish model governance mechanisms including audit trails, access controls, and compliance frameworks.• Enable secure and ethical AI practices aligned with FATE (Fairness, Accountability, Transparency, Ethics).• Contribute to improving code quality, process automation, and DevOps culture in AI teams.Technical Responsibilities:• Develop and maintain CI/CD pipelines using Azure DevOps, Git, and Azure Pipelines.• Implement model training, evaluation, and deployment workflows using MLFlow, DVC, and Airflow.• Manage model versioning and experiment tracking, enabling reproducibility and lineage.• Automate testing using frameworks like pytest, behave, and integrate SonarQube for code quality.• Design and maintain deployment strategies: Blue-Green, Canary, and Shadow deployments.• Configure monitoring and alerting pipelines using Prometheus, Grafana, and email triggers.• Enable feedback loops and retraining mechanisms triggered by concept or data drift.• Ensure rollback and recovery strategies for deployed models.Tools & Technologies:• Version Control & CI/CD: Git, Azure DevOps, Azure Pipelines, DVC• Experiment Tracking & Registry: MLFlow, DVC, Azure ML• Testing: pytest, behave, SonarQube• Orchestration: Airflow, Azure Data Factory (optional)• Monitoring & Alerting: Prometheus, Grafana, Cloudera tools, email notifications• Deployment: Docker, Kubernetes (optional), Azure ML Endpoints• Programming: Python, Bash, YAML, JSON• Storage & Compute: Azure Blob, Cloudera, HDFSPreferred Experience:• 7-8+ years of hands-on experience in MLOps, DevOps, or ML Engineering roles.• Proven experience deploying ML models at scale in production environments.• Familiarity with monitoring model performance and automating drift detection and retraining workflows.• Understanding of responsible AI concepts like fairness, transparency, and auditability.Education & Certifications:• Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.Certifications preferred:Azure DevOps Engineer ExpertCertified MLOps Professional (TWiML, Coursera, or similar)Azure AI Engineer Associate (Optional)

Posted 8 months ago