Data Engineer
Fusemachines
Islamabad, Islāmābād, Pakistan
Full-time
Contract
💰 Compensation
Not specified
📋 Job Description
About FusemachinesFusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 450 full-time employees). Fusemachines seeks to bring its global expertise in AI to transform companies around the world.Type: Full-time, RemoteAbout The RoleThis is a remote full-time position responsible for designing, building, testing, optimizing and maintaining the infrastructure and code required for data integration, storage, processing, pipelines and analytics (BI, visualization and Advanced Analytics) from ingestion to consumption, implementing data flow controls, and ensuring high data quality and accessibility for analytics and business intelligence purposes. This role requires a strong foundation in programming, and a keen understanding of how to integrate and manage data effectively across various storage systems and technologies.We are looking for a skilled Data Engineer with a strong background in Python, SQL, Pyspark and AWS cloud-based large scale data solutions with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment.This role is perfect for an individual passionate about leveraging data to drive insights, improve decision-making, and support the strategic goals of the organization through innovative data engineering solutions.Qualification ExperienceMust have a full-time Bachelor’s degree in Computer Science Information Systems, Engineering, or a related fieldAt least 2 years of experience as a data engineer with strong expertise in Python, SQL, PySpark and AWS in an Agile environment, with a proven track record of building and optimizing data pipelines, architectures, and datasets, and proven experience in data storage, modeling, management, lake, warehousing, processing/transformation, integration, cleansing, validation and analytics2 years of experience with DevOps tools and technologies: GitHub or AWS DevOpsProven experience delivering large scale projects and products for Data and Analytics, as a data engineer within AWSPreferred previous experience working with retail or other similar data modelsFollowing certifications:AWS Certified Cloud PractitionerAWS Certified Data Engineer – AssociateNice to have:Databricks Certified Associate Developer for Apache SparkDatabricks Certified Data Engineer AssociateRequired Skills/CompetenciesStrong programming Skills in one or more object-oriented languages such as Python (must have), Scala, Java, and proficiency in writing high-quality, scalable, maintainable, efficient and optimized code for data integration, storage, processing, manipulation and analytics solutions.Strong SQL skills and experience working with complex data sets, Enterprise Data Warehouse and writing advanced SQL queries. Proficient with Relational Databases (RDS, MySQL, Postgres, or similar) and NonSQL Databases (Cassandra, MongoDB, Neo4j, etc.)Strong analytic skills related to working with structured and unstructured datasetsThorough understanding of big data principles, techniques, and best practicesExperience with scalable and distributed Data Processing Technologies such as Spark/PySpark (must have including Spark SQL) and Kafka, to be able to handle large volumes of dataExperience with stream-processing systems: Storm, Spark-Streaming, etc. is a plusExperience in implementing data pipelines and efficient ELT/ETL processes, batch and real-time, in AWS and using open source solutions, being able to develop custom integration solutions as needed, including Data Integration from different sources such as APIs (PoS integrations is a plus), ERP (Oracle and Allegra are a plus), databases, flat files, Apache Parquet, event streaming, including cleansing, transformation and validation of the dataExperience in data cleansing, transformation, and validationUnderstanding of Data Modeling and Database Design Principles. Being able to implement efficient database schemas that meet the requirements to support data solutions. With good understanding of dimensional data modelingKnowledge in cloud computing specifically in AWS services related to data and analytics, such as S3, EMR, Glue, SageMaker, RDS, Redshift, Lambda, Kinesis, Lake Formation, EC2, ECS/ECR, EKS, IAM, CloudWatch, etc. implementing Data Warehousing, data lake and data lake house, solutions in AWSExperience in Orchestration using technologies like Azkaban, Luigi, Airflow, etc.Good understanding of BI solutions including Looker and LookML (Looker Modeling Language)Familiar with advanced analytics, AI/ML services and tools, and the ability to integrate advanced analytics, machine learning, and AI capabilities into data solutions, nice to haveStrong understanding of the software development lifecycle (SDLC), especially Agile meth