Data Science Solution Architect | Lead Data Scientist
Developed over 50 complex data science projects in industries such as retail, banking, loyalty programs, geo-analytics, online services, quick service restaurant, pharmacy, gamedev and travel.
10 years of challenging, hands-on experience in custom data science development.
I'm not just a regular data scientist – my KEY SKILL is delivering significant business revenue through data science methods.
What sets me apart from regular in-house data scientists? Even if their solution fails to generate revenue, they'll still receive their salary. In custom data science development, a client simply won’t pay me unless my solution delivers a significant revenue increase. That’s why every project in my experience has been a fight for true economic impact.
Developed a data science-system for strategic personalization of promo-mechanics for a fast-food chain, generating 3% more revenue compared to the CRM department.
Developed a data science-system for strategic personalization of bonus categories and cashback rates under a fixed total bonus budget for “Otkrytiye” bank (Russian bank top 10): сustomers who opted into the new loyalty program received 10% more cashback, while the total bonus budget was reduced by over 50%, with no negative impact on key customer activity metrics.
Developed a method for real-time statistical monitoring of customer behavior to detect early churn risks and assess growth potential.
Developed RL-SQL method for reinforcement learning using only SQL.
Developed the "Factor Tree" approach for identifying and quantifying hidden factors that significantly impact key customer behavior metrics, such as purchase probability and spending amounts.
Lead methodologist for the open-source Python framework for A/B testing, Abacus (https://github.com/kolmogorov-lab/abacus).
Lead methodologist for the A/B platform powered by Kolmogorov.ai (https://Kolmogorov.ai).
Author of the "Data Science in Targeted Marketing" course (a2nced.ai/marketing) and co-author of the "Deep Dive into A/B Testing" course (a2nced.ai/abtest) on A2nced.ai.
Want to increase your business revenue with data science?
Need a detailed roadmap?
Your current data science models generate no revenue?
Not sure how to improve them?
2002–2008 | National Research Nuclear University MEPhI (Moscow Engineering Physics Institute) | Faculty of Cybernetics, Department of System Analysis | Specialization: Mathematician, System Programmer | Master's degree
Responsible for delivering significant business impact in every data science project.
Proficient in all major data science methods: ⏵predictive modeling (machine learning, deep learning, simulation modeling), ⏵descriptive modeling (clusterization), ⏵prescriptive modeling (statistical modeling), ⏵“non-data” and “cold start” modeling (reinforcement learning, probabilistic modeling, expert mathematical modeling), ⏵optimization, ⏵statistical experiments (A/B-testing).
Capable of analyzing current business processes and proposing data science-based improvements for greater efficiency.
Experienced in delivering end-to-end data science solutions:
⏵ Formulating initial business hypotheses
⏵ Designing model ensemble architectures
⏵ Preparing and evaluating statistical experiments to assess economic impact
⏵ Automating solution deployment
Able to clearly and accessibly explain even the most complex mathematical concepts (including for business stakeholders).
Proficient in preparing and delivering high-impact presales presentations.
Skilled at validating the potential effectiveness of any business hypothesis.
Expert in fully automating the A/B-testing of marketing campaigns’ and business-hypotheses’ economic effectiveness.
Proficient in optimizing code by creating reusable libraries, significantly reducing time-to-market for new projects.
Experienced in agile product development. Agile certified professional.
Capable of managing multiple complex projects simultaneously.
Strong background in recruiting, mentoring, and rapidly upskilling junior team members.
Highly accurate in assessing project scope and estimating works’ cost and duration.
Proficient in breaking down complex projects into atomic tasks, determining task priority, and effectively distributing tasks among team members, considering their expertise, interests, and workloads.
ICAgile Certified Professional
Salesforce – Tableau CRM and Einstein Discovery Consultant
EXPF - "A/B Testing and Mathematical Statistics"
Luxoft Training – Data Analysis and Predictive Modeling
Luxoft Training – BigData SQL: Hive
October 2018 – Present
Customer clustering and identifying key attributes differentiating each cluster (1 project)
Predicting the probability of customers purchasing specific bank products (5 projects)
Developed reinforcement learning RL (Reinforcement Learning) models for personalized product banner plans displayed in the bank's mobile app (1 project)
Designed RL (Reinforcement Learning) models for personalizing credit product offers on the bank's website for unauthenticated clients and those with no history (1 project)
Personalizing credit limits for customers (cash loan product) (1 project)
Predicting customer churn probability for specific bank products (5 projects) and overall bank churn (1 project)
Forecasting client lifetime value (CLTV – Client Lifetime Value) for specific products (2 projects) and incorporating probabilities of customers closing or acquiring new products (1 project)
Personalizing client interaction strategies (1 project):
⏵ Statistical monitoring of customer behavior dynamics (e.g., time since last purchase, spending levels across MCC categories)
⏵ Uplift modeling to predict the growth of POS turnover based on applied client strategies
Early churn prevention model (1 project):
⏵ Statistical monitoring of intervals between customer purchases to identify whether the current interval is typical or critical for a specific customer
⏵ Development of algorithms to define individual churn thresholds for each customer
⏵ Predictive modeling of the likelihood that a customer will reach their churn threshold
⏵ Personalization algorithms for retention offers aimed at maximizing revenue, considering customer churn probabilities
Recommendation system (Market Basket Analysis) to identify product categories not yet purchased by a customer but frequently purchased by similar customers (1 project)
Upsell model for identifying products in categories where customers already shop, recommending higher-priced items frequently purchased by customers with higher spending patterns (1 project)
Strategic personalization of cashback categories and rates (3 projects):
⏵ Statistical monitoring of customer behavior dynamics (e.g., time since last purchase, spending levels across various MCC categories)
⏵ Residual customer’s profitability estimation (CLTV – Client Lifetime Value).
⏵ Personalizing customer interaction strategies by balancing encouragement, development, retention, and reactivation tactics
⏵ Determining individual bonus budgets based on personalized strategies
⏵ Predicting spending changes after receiving cashback in specific categories
⏵ Assessing the relevance of specific categories for customers
⏵ Estimating the likelihood of customer response to offers
⏵ Simulating total bonus budget impact
⏵ The result achieved in the “Otkrytiye” Bank (top 10 Russian bank): сustomers who opted into the new loyalty program received 10% more cashback, while the total bonus budget was reduced by over 50%, with no negative impact on key customer activity metrics
Personalizing privilege levels for customers (1 project): predicting customer sensitivity to privilege levels and forecasting spending changes based on privilege adjustments
Developed a recommendation system for loyalty program partner brands (1 project): ranking partner brands by relevance to specific customers
Weather-based purchase prediction model (1 project): forecasting the likelihood of customer purchases in specific MCC categories based on weather conditions
Cross-sell model (MBA – Market Basket Analysis) for partner brands (1 project): identifying partner brands not yet engaged by a customer but frequently purchased by similar customers across various MCC categories
Communication frequency personalization model (1 project): identifying customer segments likely to maintain engagement with increased communication frequency without reducing communication open rates, unsubscribe rates, or behavioral dynamics
Intelligent search for partner offers on loyalty program website and mobile app (1 project): NLP (Natural Language Processing)-based model evaluating semantic similarity between search queries and offer descriptions
Model for identifying reasons behind declining NPS (Net Promoter Score) (1 project)
Model for detecting negative themes in customer reviews (LLM – Large Language Model – based) (1 project)
Customer cluster migration monitoring model (2 projects):
⏵ Clustering customers based on key behavior metrics
⏵ Evaluating economic risks of customer migration between clusters
⏵ Ranking migration patterns by economic risk
Developed methodology to evaluate the business impact of a federal loyalty program without control group availability (1 project): evaluating business-effect of the loyalty program based on mass promotion channels
Optimized and automated A/B testing processes for a personalized banner model in the Alfa-Bank mobile app (Top-3 bank in Russia) (1 project)
Developed a methodology to evaluate the business impact of pricing strategies (3 projects) at two levels:
⏵ Store-level analysis using control groups
⏵ Regional-level analysis without the possibility of control group formation
Created the open-source A/B-testing library Kolmogorov ABacus (Pipeline constructor for automated preparation and evaluation of statistical experiment results):
⏵ Developed a client stratification algorithm using preliminary data clustering (to form control and test groups comparable to each other and the total observation population)
⏵ Designed a bootstrap testing algorithm: a universal statistical criterion for evaluating any experiment, regardless of the target metric's value distribution
⏵ Built algorithms for test power calculation, determining the required sample size, and estimating MDE (Minimal Detectable Effect) using multiple test simulations
⏵ Enhanced test sensitivity to reduce experiment duration and evaluate weak marketing impacts
⏵ Implemented bucketization algorithm for faster calculations and data normalization
⏵ Created an evaluation algorithm for experiments with disrupted stratification (e.g., when control and test groups were initially unmatched or external communications were sent to some participants during the experiment)
Developed the computational core for an A/B testing platform
Established business processes for experiment preparation, launch, monitoring, and result evaluation for an A/B testing platform, incorporating roles such as business requester, data analyst, data engineer, and campaign manager
Designed a data architecture for conducting overlapping tests with symmetrical impacts on control and test groups
Introduced a methodology for multiple hypothesis testing to compare various groups and interventions across different business metrics
Strategic coupon personalization model (2 projects):
⏵ Identifying and quantifying hidden factors influencing purchase probability and customer spending
⏵ Statistical monitoring of customer behavior dynamics (e.g., time since last purchase, spending levels across food categories)
⏵ Recommendation system for assessing the relevance of specific promotions for customers
⏵ Evaluating the appeal of promotion terms for individual customers
⏵ Personalizing customer interaction strategies by balancing encouragement, development, retention, and reactivation tactics
⏵ Uplift modeling to forecast changes in customer spending after receiving specific promotions
⏵ Uplift modeling to predict the likelihood of customers making purchases in response to promotional offers
Automating the preparation and evaluation processes for regular A/B testing (1 project)
Developed a geo-temporal customer profile (1 project): predicting the likelihood of a customer making a purchase in a specific MCC category within a certain map area and hour during the week
Developed a geo-filter for recommendation systems (1 project):
⏵ Excluding brands from recommendation feeds if their store locations are too far from the customer’s usual areas
⏵ Geo-temporal profile modeling to assess the ""gravitational strength"" of partner store locations
⏵ Predicting the likelihood of a customer visiting a specific store based on their geo-temporal profile
Audience targeting model for "small" partners of a "large" loyalty program (1 project):
⏵ Selecting audiences for small partners based on geo-filter parameters, such as the frequency of purchases within a specific radius around the partner’s store
⏵ Personalizing geo-filter parameters to maximize the audience size while maintaining a targeted conversion rate for campaign participants
Prize personalization for players (assessing prize relevance) (6 projects). The results? 10% of players started purchasing more attempts, while the total bonus budget for the game remained unchanged—achieving higher engagement at the same cost for the loyalty program
Predictive assessment of the actual cost of prizes awarded to players (forecasting deferred costs for prizes with non-fixed values, e.g., temporary increased cashback in specific MCC categories) (5 projects)
Predictive assessment of task difficulty for players (forecasting win probability or prize amount based on game level parameters) (3 projects)
Personalizing player interaction strategies: development, encouragement, retention (1 project)
Personalizing game level difficulty for players (3 projects)
Simulating game economy using predictive models (4 projects)
Individual map generation for players in turn-based games (personalized placement of prizes and tasks on the map to ensure player development, encouragement, retention, and overall positive game economy) (2 projects)
Automating online management of game economy (1 project)
Statistical monitoring of customer activity for early churn detection (2 projects): analyzing whether the current values of activity metrics (e.g., time since the previous visit, session duration, number of items viewed, viewing depth and duration, cart contents, previous check amount) are typical or critical for a specific customer
Predicting the most likely time for a customer to use a specific online service (2 projects): enabling proactive delivery of partner service offers
Developed an ad broker for an online cinema platform (1 project):
⏵ Model to estimate the probability of a customer performing the desired action for a specific advertising campaign
⏵ Model to calculate the weight of advertising campaigns for specific customers
⏵ Model to select advertising campaigns for display to customers based on weighted prioritization
Developed a model to select advantageous flight ticket offers (1 project):
⏵ Predicting flight costs based on data such as brand, aircraft model and age, operator, departure time, route, layover location, and duration
⏵ Identifying offers where the actual cost is below the predicted cost for flights with the same parameters
Predicting the likelihood of customers purchasing specific travel packages (1 project):
⏵ Using travel data (e.g., destination, duration, price index, type of location such as historical, family-oriented, sports, beach, or mountain resorts with lifts, and predicted vs. actual costs)
⏵ Incorporating customer data (e.g., time since last trip, previous travel destinations)
Early churn prevention model (1 project):
⏵ Statistical monitoring of purchase intervals across categories addressing various customer needs (e.g., seasonal, situational, chronic conditions, organ-specific treatments, vitamins, cosmetics, everyday items) to assess whether current intervals are typical or critical for specific customers
⏵ Uplift modeling to predict customer spending growth based on offer parameters such as category, type, and reward amount
Development of the "Avoided Lost Revenue" metric to evaluate the effectiveness of churn prevention campaigns. The metric was aligned with the financial department head of a retail pharmacy chain, and an algorithm for calculating this metric for each campaign was developed (1 project)
Developed a reinforcement learning (RL) model for adaptive pricing at the product-region level (1 project)
Developed a methodology to evaluate business impact at the regional level without control group availability (1 project)
Developed ETL scripts for data processing in Kafka (KSQL)
Ensured data quality control through:
⏵ Writing SQL queries (Impala, Oracle) for data reconciliation
⏵ Creating Python scripts to automatically generate SQL code for data validation
⏵ Automating data correctness checks using Python, Impala-shell, and Bash
Developed the open-source Python framework for A/B testing, ABacus (www.github.com/kolmogorov-lab/abacus) (detailed above in the A/B Testing Domain section), which has automated A/B testing for multiple companies in retail, banking, and QSR
Built the A/B testing platform based on Kolmogorov.ai (www.Kolmogorov.ai) (detailed above in the A/B Testing Domain section)
Designed a method and an accompanying Python library for real-time statistical monitoring of customer behavior changes (statistical customer profile) for early churn detection and growth potential evaluation
Created a Python library for reinforcement learning (RL) to address various multi-armed bandit problems, allowing or restricting item repetition in recommendations for the same customer. The library supports multiple approaches, including Thompson Sampling, UCB1 and epsilon-greedy algorithms, with options for binary / numerical target metrics under constraints (e.g., minimum exposure for focus items) and different online / offline retraining and scoring modes. Using RL models, I have:
⏵ Automated the generation of personalized banner display plans in a mobile banking app for customers without transaction history (top 10 Russian bank)
⏵ Developed dynamic pricing in a high-end electronics retail chain (top 10 in Russia)
Developed RL-SQL: an RL (reinforcement learning) approach implemented entirely in SQL to solve multi-armed bandit problems using generalized Thompson Sampling for arbitrary distribution forms.
LUXOFT
March 2017 – September 2018
Senior Analyst
Designed analytical dashboards
Developed algorithms for distributed computing: normalization, transformation, and deduplication of big data (MDM – Master Data Management) using Hadoop
Created interface layouts and interaction logic, optimizing UX
Designed architecture for storing and analyzing user behavior data on the product website (CJM – Customer Journey Map)
Developed key statistical metrics for big data quality control
Designed metadata storage architecture (PostgreSQL) for use in big data processing (Hadoop).
PROGRESS SOFT
August 2015 – March 2017
Lead Analyst
Developed key performance metrics for the business system
Created monitoring mechanisms and tools for rapid localization of issues affecting key business metrics (Python)
Designed managerial dashboards
Developed layouts for user interfaces
Built scripts for collecting and analyzing user activity data (Python, PostgreSQL)
Created scripts for parsing unstructured raw data (Python)
Developed scripts (Python) for automatic XML schema generation for OLAP cubes based on metadata
Designed an adaptive metadata storage architecture (allowing database structure changes without modifying the schema when attributes of stored data are updated)
Managed the development team: estimated workload, formed the backlog (task definition, prioritization, and quality control).
OTHER PROJECTS
Developed a system for constructing optimal investment portfolios based on user-defined levels of return or risk. The system also monitors and automatically updates the stock price database.
Improved the performance of a corporate MDM module. This project involved a comprehensive analysis of the master data management department’s business processes, identifying performance bottlenecks, and developing a set of solutions to enhance subsystem performance. One key solution included designing an expert system to automate MDM expert decisions (e.g., for data quality assessment).
Designed a model for stress-resistant business processes capable of operating under conditions where the control parameters of the process change dynamically during execution.
Transitioned a small manufacturing facility to CAD. As part of this project, developed a set of scripts to automate tasks for 3D designers.
WhatsApp: www.wa.me/qr/CBNGYIFKMU5AJ1
Telegram: www.t.me/dmitry_zabavin
Email: dmitryzabavin@gmail.com
LinkedIn: www.linkedin.com/in/dmitry-zabavin
Merch: www.wa.me/c/77472538743