About
Hey there! I am a Staff Machine Learning Engineer @ Meta. I have 13 years of industry experience, and delivered more than 20 innnotive data projects and products. I am also the founder of an open source project uniframe.io, a String-Matching-as-a-Service product.
I was a freelancer between November 2020 and July 2022, where I delivered data consulting and training. I do not do any consulting project for now.
Summary
- 13-year data science, data engineering and software engineering experience in multiple international companies
- Delivered multiple end-to-end innovative data products and projects with massive data in different industries
- Solid machine learning knowledge and skills; a Kaggle Expert with 1 gold and 1 silver medals
- 9x AWS certified including Professional Solution Architect, Devops Engineer, Big Data Specialty, Machine Learning Specialty, Security Specialty, etc; 2x Kubernetes certified including CKA and CKAD
- Experience in data science and data engineering consultancy, business translation, Scrum and project management
Latest Professional Experiences
- 2022.08 – present, Staff Machine Learning, Meta
- 2021.01 – present, Founder, uniframe.io
- 2020.11 – 2022.07, Lead Machine Learning Engineer (Contractor), Nike
- 2018.10 – 2020.10, Data Science and Data Engineering Consultant, Amazon Web Services
- 2015.11 – 2018.09, Senior Data Scientist, ING Wholesale Banking Advanced Analytics Team
Projects Experiences
- [Banking][Prod] Next Best Action Model Migration and Productization (ML Ops on AWS)
- [Banking][Prod] Personal Identifiable Information Detection (NLP, classification, Spark, Airflow)
- [Banking][Prod] Company Similarity Detection (graph network embedding, similarity computation, Airflow)
- [Banking][Prod] Large Scale Fuzzy Entity Matching (Spark, NLP, similarity computation, Airflow)
- [Banking][Prod] Customer Segment Leads Detection (classification, Spark, name matching)
- [Banking][Prod] Mortgage Arrears Repayment Classification (imbalance learning, Random Forest)
- [Banking] Company Sales Prediction (time series forecasting, Seq2Seq)
- [Banking][Prod] Company Financial Insight Dashboard (data pipeline, Spark, front-end, back-end, name matching, dashboarding)
- [Banking] Data Lake Proof of Concept (AWS data lake, S3, Glue, Kafka, DynamoDB, ElasticSearch)
- [Manufactory][Prod] Instagram Influencer Sale and Trend Prediction (classification, computer vision)
- [Manufactory] Pill Images Classification on IoT Device (IoT, image classification)
- [Energy][Prod] Dirty Cars and Smoking Person Detection in Petrol Station (IoT, image classification)
- [Energy] Broken Device Detection on Electricity Transmission Network (object detection, multi-GPU)
- [Energy][Prod] Customer Churn Prediction Model Data Pipeline (Spark, S3, Glue, SageMaker, AWS security)
- [Retail][Prod] Advance Analytics Platform: managed JupyterHub and Airflow on Kubernetes (AWS EKS, Helm, Spark, Airflow, DevOps)
- [Telecom] Real-time Streaming Data Ingestion and Personalized Recommendation PoC (Kafka, Glue, S3, DynamoDB, Lambda, Amazon Personalized)
- [Telecom][Prod] Data Science Model Platform (SageMaker, Glue, StepFunction, Lambda, API Gateway)
- [HR] Job Categorical Classification from 500K Job Description (NLP Classification, multi-class multi-label)
- [Environment] Predictive Maintenance on Dike Sensor Data (time series clustering, Dynamic Time wrapping)
- [Lottery][Prod] Data Platform (AWS S3, Redshift, DMS, Glue, CloudWatch)
- [Media] Oracle Database Event-driven automated Migration (AWS DMS, Lambda, Cloudformation)
Open Source Contributions
- Fast way to get top n results from sparse matrix multiplication, co-author (link), ~15000 daily download, 285 stars
- Factorization Machine on Spark, co-author (link), 210 stars
- Time Series Generator, co-author, (link), 66 starts
- Industry Code Embedding, co-author (link)
- Time Series Forecasting in PyTorch, author (link)
- Amazon SageMaker Examples, contributor (link)
Blogs
- Boosting the Selection of the Most Similar Entities in Large Scale Datasets (link)
- Industry2vec: A Novel Approach to Get Industry Code Embedding (link)
- Accurately Labeling Subjective Question-Answer Content Using BERT (link)
- Some AWS SageMaker related blogs: (link), (link)
Talks (selected)
- Machine Learning Industrialization, AWS User Group the Netherlands, Nov 2019 (link)
- ML Ops and ML Industrialization, GoDataFest 2019, Amsterdam, Oct 2019 (link)
- Peer Detection in Massive Payment Transaction Network, Open Data Science Conference, London, August 2018 (link)
- Large Scale Fuzzy Name Matching with a Custom ML Pipeline in Batch and Streaming, Spark+AI Summit 2018, San Francisco, June 2018 (link)
Kaggle Competition
- Gold medal (6th in 1571 teams): Google Quest Q&A Labelling, 2020 (link)
- Silver medal (43th in 924 teams): Give Me Some Credit, 2010 (link)
Publication
- Classification System for Mortgage Arrear Management, Proceeding of IEEE Computational Intelligence for Financial Engineering and Economics 2014 (link)