About Me

Highly skilled Big Data Engineer with 3 years of hands-on experience in building, testing, and maintaining Spark, Databricks, Glue, Hadoop and multiple SQL databases on AWS using Python, Scala, and SQL. Extensive expertise in leveraging AWS servicesto maximize the potential of Hadoop features. Proven developer in core Python, Scala, and SQL, with a strong background in Big Data Engineering. Experienced in working with databases, AWS, ETL tools, data warehouses, data modeling, and data visualization.

  • Program Languages
    Python, SQL, Scala.
  • Database Management
    MySQL, AWS Redshift, HDFS, Snowflake, Hive.
  • Big Data Technologies
    Hadoop Ecosystem (Hadoop, HDFS, MapReduce), Stream and Batch Processing, Apache Airflow, Apache Spark.
  • Development
    Git, Github, Pycharm, Virtual Studio Code.
  • Cloud Platforms
    Amazon Web Services (S3, Redshift, Glue, Athena, EMR, EC2, Lamda), Microsoft Azure (Blob Storage, Data Factory SQL Data Warehouse, Databricks).
  • Analytics
    AWS QuickSight, Databricks Dashboard, Plotly, Seaborn and Matplotlib.
  • Rhyme
    • Worked in an AWS environment and then migrated the pipeline into Databricks.
    • Data was stored in an S3 bucket and later read using AWS Glue in order to transform and store into the Glue data catalog.
    • Transformed, mapped data utilizing pyspark and dynamic frames to automatically infer json file schema.
    • Used lambda and SNS functions in order to trigger required glue jobs and send notifications in case of an error.
    • Used Athena in order to query the data.
    • Migrated into a Databricks ecosystem.
    • Worked with auto loader in Databricks in order to incrementally and efficiently process new data files that arrived in the S3 bucket, and also ingested these files using spark streaming.
    • Utilized a bronze, silver and gold architecture in order to manage different quality levels.
    • Built a robust monitoring notebook in order to detect if there are any bad records or mismatched schema.
    • Created data validity checks to ensure that all data files are being ingested.
    • Used Databricks workflows to orchestrate and automatically run the ETL pipeline and monitor these workflows using notifications.
    • Optimized delta tables in the delta lake using optimize, vacuum and Z-ordering indexes.
    • Utilized Databricks Dashboards to create data quality views.
    • Implemented and configured data pipelines as well as tuning processes for performance and scalability.
  • Nike
    • Implemented data scraping along with a data preprocessing layer part of the data ingestion process on AWS lambda.
    • Worked with diverse data sets in AWS S3, identified, and developed new sources of data, and collaborated with product teams to ensure successful integration.
    • Created an ETL process to consume, transform and load data using AWS S3, Glue and Redshift to prepare the data pro the analysis.
    • Worked with business units to fulfill the requirements while creating the data model for the data warehouse and adding business logic to the data using AWS Glue.
    • Data exploration was done using various data visualization techniques in python to find unseen trends from the data.
    • Used different Python libraries to process data such as pandas, pyspark, and boto3.
    • Tools used: Python, Spark, AWS Glue, AWS Athena, Hadoop, Redshift, SNS, AWS Step Functions, SQS, and AWS Lambda.
    • Developed ETL and Data Pipelines using AWS Glue and used Athena for data profiling and querying files from S3.
  • 2020
    Bachelor of Science in Mechanical Engineering
  • 2023-Udemy
    Azure Databricks & Spark For Data Engineers

My Work

In Production

Stay Tuned

In Production

Stay Tuned

In Production

Stay Tuned

Contact Me

husseinghosn3@gmail.com

404-259-9097

Download Resume