Senior Data Engineer (Oct, 2021
–Present) at Factspan Analytics
Built Data ingestion pipelines using AWS Glue, Snowpipes, Apache Nifi.
Accomplished excellent ETL knowledge by building complex SCD flows using streams, tasks, snowflake sql.
Automated multiple process using python reducing 25% of manual intervention and helping to complete the sprints in time. Developed Time Series ML model to predict the Snowflake warehouse size. Built SnowPipe/Snowtasks Reporting Framework reducing 90% of monitoring efforts.
Performed Text Preprocessing like removing stop words, tokenization using n-grams, normalization, negation handling, emoji cleaning, lemmatization.
Developed logic for polarity of tweets and reviews using NLTK VADER, TEXTBLOB, Pandas and Web scraping by converting the reviews and tweets into trigrams and adding some mathematical formulae to increase the accuracy of negative reviews/tweets.
Worked as AWS Data Engineer to ingest data from cross account S3 location using AWS Lambda, AWS SQS, AWS SNS. Built CICD pipeline using GIT, AWS CodeBuild, AWS ECR and Docker images.
Built a generic real time framework to migrate data from AWS S3/Snowflake to Google cloud storage, Globalscape & Salesforce using AWS SQS, AWS Lambda, AWS Step Function and AWS Secrets Manager.
Built Customer segmentation model to score the visitors based on their previous activities. Used Google Vertex AI to build the model using decision tree with 97% accuracy.
Worked as solution lead and senior data engineer to build data pipelines for 100+ tables using AWS API Gateway, AWS Lambda, AWS S3, Rabbit MQ, Snowflake, DBT.
Built AWS Kinesis Firehose data stream, AWS MSK to ingest data from Google Pubsub.
Built AWS DMS tasks to migrate data from RDBMS source systems.
Used DBT Core for ETL development and reduced 20% of testing efforts and increased 50% of code quality.
Built a NLP based resume sorting model reducing the 75% efforts of talent acquisition team and reducing the 100% amount spent on ATS tool.