All works till now.
This commit is contained in:
360
README.md
360
README.md
@@ -1,2 +1,362 @@
|
|||||||
# birthdaymessaging
|
# birthdaymessaging
|
||||||
|
|
||||||
|
|
||||||
|
Overview
|
||||||
|
|
||||||
|
The Birthday Messaging Intelligence Platform is a data-driven engagement system designed to transform static birthday campaigns into an intelligent engagement engine.
|
||||||
|
|
||||||
|
The system integrates CRM data, behavioral interaction logs, and predictive analytics to automatically identify high-value engagement opportunities, detect churn risk, and orchestrate personalized birthday messaging campaigns.
|
||||||
|
|
||||||
|
The platform evolves through three stages:
|
||||||
|
|
||||||
|
Stage Description
|
||||||
|
Week 2 Data pipeline and engagement preparation
|
||||||
|
Week 3 Predictive intelligence and behavioral segmentation
|
||||||
|
Future Multi-channel automation and campaign optimization
|
||||||
|
|
||||||
|
The end result is a system that predicts disengagement before it happens and activates personalized engagement campaigns.
|
||||||
|
|
||||||
|
System Architecture
|
||||||
|
|
||||||
|
The project is organized into modular components following a data pipeline → intelligence → automation architecture.
|
||||||
|
|
||||||
|
CRM Data + Clickstream Logs
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Unified Data Pipeline
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Data Cleaning & Compliance
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Engagement Intelligence
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Churn Prediction
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Predictive Segmentation
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
Automation Triggers
|
||||||
|
Project Structure
|
||||||
|
birthdaymessaging/
|
||||||
|
│
|
||||||
|
├── config.yaml
|
||||||
|
├── main.py
|
||||||
|
├── README.md
|
||||||
|
│
|
||||||
|
├── data/
|
||||||
|
│ ├── raw/
|
||||||
|
│ └── processed/
|
||||||
|
│
|
||||||
|
├── docs/
|
||||||
|
│ ├── ethics.md
|
||||||
|
│ └── model-audits/
|
||||||
|
│
|
||||||
|
├── src/
|
||||||
|
│ ├── pipeline/ # Week 2 Data Engineering
|
||||||
|
│ │ ├── unified_stream.py
|
||||||
|
│ │ ├── relative_date_trigger.py
|
||||||
|
│ │ ├── rf_scoring.py
|
||||||
|
│ │ ├── list_hygiene.py
|
||||||
|
│ │ ├── ethics_flags.py
|
||||||
|
│ │ ├── deliverability_check.py
|
||||||
|
│ │ └── config_loader.py
|
||||||
|
│ │
|
||||||
|
│ ├── models/ # Week 3 Intelligence Layer
|
||||||
|
│ │ ├── engagement_scoring.py
|
||||||
|
│ │ ├── churn_classifier.py
|
||||||
|
│ │ └── send_time_optimizer.py
|
||||||
|
│ │
|
||||||
|
│ ├── segmentation/
|
||||||
|
│ │ └── predictive_segments.py
|
||||||
|
│ │
|
||||||
|
│ └── automation/
|
||||||
|
│ ├── trigger_engine.py
|
||||||
|
│ └── reactivation_triggers.yaml
|
||||||
|
Week 2: Data Engineering Pipeline
|
||||||
|
|
||||||
|
Week 2 focuses on building a clean, compliant data foundation.
|
||||||
|
|
||||||
|
Unified Data Stream
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
src/pipeline/unified_stream.py
|
||||||
|
|
||||||
|
Functions:
|
||||||
|
|
||||||
|
Load CRM dataset
|
||||||
|
|
||||||
|
Load clickstream activity logs
|
||||||
|
|
||||||
|
Pseudonymize user identifiers
|
||||||
|
|
||||||
|
Merge datasets
|
||||||
|
|
||||||
|
Normalize timestamps (UTC)
|
||||||
|
|
||||||
|
Output:
|
||||||
|
|
||||||
|
unified_dataset.csv
|
||||||
|
Birthday Recursion Trigger
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
relative_date_trigger.py
|
||||||
|
|
||||||
|
Implements recurring campaign triggers such as:
|
||||||
|
|
||||||
|
Birthday approaching
|
||||||
|
|
||||||
|
Pre-birthday reminder
|
||||||
|
|
||||||
|
Post-birthday follow-up
|
||||||
|
|
||||||
|
Ensures contacts re-enter the workflow annually.
|
||||||
|
|
||||||
|
Recency-Frequency Scoring
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
rf_scoring.py
|
||||||
|
|
||||||
|
Calculates engagement indicators:
|
||||||
|
|
||||||
|
Recency index
|
||||||
|
|
||||||
|
Frequency of sessions
|
||||||
|
|
||||||
|
Session change delta
|
||||||
|
|
||||||
|
Used as an early engagement signal.
|
||||||
|
|
||||||
|
List Hygiene Automation
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
list_hygiene.py
|
||||||
|
|
||||||
|
Improves deliverability by removing:
|
||||||
|
|
||||||
|
Invalid email domains
|
||||||
|
|
||||||
|
Disposable email services
|
||||||
|
|
||||||
|
Role-based addresses
|
||||||
|
|
||||||
|
Ethics and Compliance Layer
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
ethics_flags.py
|
||||||
|
|
||||||
|
Detects sensitive data usage:
|
||||||
|
|
||||||
|
medical_condition
|
||||||
|
|
||||||
|
political_opinion
|
||||||
|
|
||||||
|
dietary_restrictions
|
||||||
|
|
||||||
|
Ensures compliance with GDPR principles.
|
||||||
|
|
||||||
|
Deliverability Simulation
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
deliverability_check.py
|
||||||
|
|
||||||
|
Simulates email infrastructure readiness:
|
||||||
|
|
||||||
|
SPF validation
|
||||||
|
|
||||||
|
DKIM compatibility
|
||||||
|
|
||||||
|
DMARC enforcement
|
||||||
|
|
||||||
|
inactive contact suppression
|
||||||
|
|
||||||
|
Week 3: Engagement Intelligence Engine
|
||||||
|
|
||||||
|
Week 3 converts the pipeline into a predictive analytics platform.
|
||||||
|
|
||||||
|
Engagement Scoring Engine
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
src/models/engagement_scoring.py
|
||||||
|
|
||||||
|
Calculates a normalized engagement score (0–100) based on:
|
||||||
|
|
||||||
|
Recency of interaction
|
||||||
|
|
||||||
|
Session frequency
|
||||||
|
|
||||||
|
Session duration
|
||||||
|
|
||||||
|
Click engagement depth
|
||||||
|
|
||||||
|
Formula:
|
||||||
|
|
||||||
|
Engagement Score =
|
||||||
|
(Recency Weight × Normalized Recency)
|
||||||
|
+ (Frequency Weight × Session Count)
|
||||||
|
+ (Engagement Depth Weight × Interaction Index)
|
||||||
|
|
||||||
|
Output classification:
|
||||||
|
|
||||||
|
Score Tier
|
||||||
|
75–100 Highly Engaged
|
||||||
|
50–74 Stable
|
||||||
|
25–49 Declining
|
||||||
|
0–24 At Risk
|
||||||
|
Churn Risk Classification
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
src/models/churn_classifier.py
|
||||||
|
|
||||||
|
Defines churn risk using behavioral signals.
|
||||||
|
|
||||||
|
Conditions:
|
||||||
|
|
||||||
|
No login ≥ X days
|
||||||
|
AND
|
||||||
|
Engagement score drop ≥ Y%
|
||||||
|
|
||||||
|
Risk categories:
|
||||||
|
|
||||||
|
Risk Level Description
|
||||||
|
Low Healthy engagement
|
||||||
|
Medium Engagement declining
|
||||||
|
High High probability of churn
|
||||||
|
|
||||||
|
Output fields:
|
||||||
|
|
||||||
|
churn_risk_flag
|
||||||
|
churn_probability_score
|
||||||
|
risk_class
|
||||||
|
Predictive Segmentation
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
src/segmentation/predictive_segments.py
|
||||||
|
|
||||||
|
Creates behavioral marketing segments using:
|
||||||
|
|
||||||
|
Lifecycle Stage
|
||||||
|
Engagement Tier
|
||||||
|
Birthday Proximity
|
||||||
|
Churn Risk Overlay
|
||||||
|
|
||||||
|
Example segment:
|
||||||
|
|
||||||
|
At-Risk + Birthday Approaching
|
||||||
|
Send-Time Optimization
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
src/models/send_time_optimizer.py
|
||||||
|
|
||||||
|
Analyzes historical open patterns and determines optimal send windows.
|
||||||
|
|
||||||
|
Outputs:
|
||||||
|
|
||||||
|
optimal_send_hour
|
||||||
|
optimal_send_day
|
||||||
|
Automation Triggers
|
||||||
|
|
||||||
|
File:
|
||||||
|
|
||||||
|
src/automation/reactivation_triggers.yaml
|
||||||
|
|
||||||
|
Defines campaign orchestration rules.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
IF churn_risk_flag = True
|
||||||
|
AND birthday_within_30_days = True
|
||||||
|
|
||||||
|
THEN send reminder email
|
||||||
|
WAIT 24h
|
||||||
|
IF unopened → send SMS
|
||||||
|
Model Governance & Ethics
|
||||||
|
|
||||||
|
Documentation:
|
||||||
|
|
||||||
|
docs/model-audits/week3-bias-review.md
|
||||||
|
docs/ethics.md
|
||||||
|
|
||||||
|
Audits include:
|
||||||
|
|
||||||
|
Geographic bias
|
||||||
|
|
||||||
|
Age-based performance variance
|
||||||
|
|
||||||
|
False-positive churn classification
|
||||||
|
|
||||||
|
Transparency measures include clear documentation of automated decision logic.
|
||||||
|
|
||||||
|
Running the Pipeline
|
||||||
|
|
||||||
|
Run the full pipeline:
|
||||||
|
|
||||||
|
python main.py
|
||||||
|
|
||||||
|
Pipeline order:
|
||||||
|
|
||||||
|
Week 2:
|
||||||
|
Data Merge
|
||||||
|
Birthday Triggers
|
||||||
|
RF Scoring
|
||||||
|
List Hygiene
|
||||||
|
Ethics Checks
|
||||||
|
Deliverability Simulation
|
||||||
|
|
||||||
|
Week 3:
|
||||||
|
Engagement Scoring
|
||||||
|
Churn Classification
|
||||||
|
Predictive Segmentation
|
||||||
|
Send-Time Optimization
|
||||||
|
|
||||||
|
Final dataset:
|
||||||
|
|
||||||
|
data/processed/final_intelligence_output.csv
|
||||||
|
Success Criteria
|
||||||
|
|
||||||
|
The system is considered operational when:
|
||||||
|
|
||||||
|
Engagement scores generated for all active users
|
||||||
|
|
||||||
|
Churn risk classification validated
|
||||||
|
|
||||||
|
Predictive segments documented
|
||||||
|
|
||||||
|
Reactivation triggers tested
|
||||||
|
|
||||||
|
Send-time optimization operational
|
||||||
|
|
||||||
|
Ethics documentation updated
|
||||||
|
|
||||||
|
Bias audit completed
|
||||||
|
|
||||||
|
Outcome
|
||||||
|
|
||||||
|
At completion, the platform functions as an Engagement Intelligence Engine.
|
||||||
|
|
||||||
|
Capabilities include:
|
||||||
|
|
||||||
|
Predicting disengagement
|
||||||
|
|
||||||
|
Prioritizing high-value users
|
||||||
|
|
||||||
|
Automating reactivation campaigns
|
||||||
|
|
||||||
|
Personalizing birthday marketing
|
||||||
|
|
||||||
|
Ensuring ethical data usage
|
||||||
|
|
||||||
|
The platform transforms traditional birthday messaging into a behaviorally intelligent engagement strategy.
|
||||||
|
|
||||||
|
|||||||
@@ -2,3 +2,12 @@ crm_file: "sample_data/crm.csv"
|
|||||||
clickstream_file: "sample_data/clickstream.csv"
|
clickstream_file: "sample_data/clickstream.csv"
|
||||||
output_dir: "output"
|
output_dir: "output"
|
||||||
log_dir: "logs"
|
log_dir: "logs"
|
||||||
|
reactivation_trigger:
|
||||||
|
conditions:
|
||||||
|
churn_risk_flag: true
|
||||||
|
birthday_within_30_days: true
|
||||||
|
actions:
|
||||||
|
- send_email: emotional_reminder
|
||||||
|
- wait: 24h
|
||||||
|
- if_unopened:
|
||||||
|
send_sms: follow_up
|
||||||
@@ -62,3 +62,59 @@ INFO:root:Loading CRM and clickstream data...
|
|||||||
INFO:root:Hashing user IDs for pseudonymization...
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
INFO:root:Merging datasets...
|
INFO:root:Merging datasets...
|
||||||
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
INFO:root:Loading CRM and clickstream data...
|
||||||
|
INFO:root:Hashing user IDs for pseudonymization...
|
||||||
|
INFO:root:Merging datasets...
|
||||||
|
INFO:root:Unified dataset saved to output\unified_dataset.csv
|
||||||
|
|||||||
17
main.py
17
main.py
@@ -6,9 +6,13 @@ from src.pipeline.list_hygiene import apply_list_hygiene
|
|||||||
from src.pipeline.ethics_flags import apply_ethics_checks
|
from src.pipeline.ethics_flags import apply_ethics_checks
|
||||||
from src.pipeline.deliverability_check import run_deliverability_check
|
from src.pipeline.deliverability_check import run_deliverability_check
|
||||||
|
|
||||||
print("=== WEEK 2 PIPELINE START ===")
|
from src.models.engagement_scoring import calculate_engagement_scores
|
||||||
|
from src.models.churn_classifier import classify_churn
|
||||||
|
from src.models.send_time_optimizer import optimize_send_time
|
||||||
|
from src.models.predictive_segments import build_predictive_segments
|
||||||
|
|
||||||
load_and_merge()
|
print("=== WEEK 2 PIPELINE START ===")
|
||||||
|
df= load_and_merge()
|
||||||
apply_birthday_recursion()
|
apply_birthday_recursion()
|
||||||
calculate_rf_score()
|
calculate_rf_score()
|
||||||
apply_list_hygiene()
|
apply_list_hygiene()
|
||||||
@@ -17,3 +21,12 @@ apply_ethics_checks()
|
|||||||
run_deliverability_check()
|
run_deliverability_check()
|
||||||
|
|
||||||
print("=== WEEK 2 PIPELINE COMPLETE ===")
|
print("=== WEEK 2 PIPELINE COMPLETE ===")
|
||||||
|
|
||||||
|
print("=== WEEK 3 START ===")
|
||||||
|
|
||||||
|
df = calculate_engagement_scores(df)
|
||||||
|
df = classify_churn(df)
|
||||||
|
df = optimize_send_time(df)
|
||||||
|
df=build_predictive_segments(df)
|
||||||
|
|
||||||
|
print("=== WEEK 3 COMPLETE ===")
|
||||||
|
|||||||
@@ -1,10 +1,10 @@
|
|||||||
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score
|
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score
|
||||||
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,28,0.034482758620689655
|
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,42,0.023255813953488372
|
||||||
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,48,0.02040816326530612
|
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,62,0.015873015873015872
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,38,0.02564102564102564
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,52,0.018867924528301886
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,21,0.045454545454545456
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,35,0.027777777777777776
|
||||||
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,160,0.006211180124223602
|
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,174,0.005714285714285714
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,64,0.015384615384615385
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,78,0.012658227848101266
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,13,0.07142857142857142
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,26,0.037037037037037035
|
||||||
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,24,0.04
|
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,37,0.02631578947368421
|
||||||
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,205,0.0048543689320388345
|
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,219,0.004545454545454545
|
||||||
|
|||||||
|
@@ -1,10 +1,10 @@
|
|||||||
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score
|
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score
|
||||||
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,28,0.0344827586206896
|
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,42,0.0232558139534883
|
||||||
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,48,0.0204081632653061
|
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,62,0.0158730158730158
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,38,0.0256410256410256
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,52,0.0188679245283018
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,21,0.0454545454545454
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,35,0.0277777777777777
|
||||||
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,160,0.0062111801242236
|
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,174,0.0057142857142857
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,64,0.0153846153846153
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,78,0.0126582278481012
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,13,0.0714285714285714
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,26,0.037037037037037
|
||||||
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,24,0.04
|
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,37,0.0263157894736842
|
||||||
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,205,0.0048543689320388
|
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,219,0.0045454545454545
|
||||||
|
|||||||
|
@@ -1,10 +1,10 @@
|
|||||||
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score,deliverable
|
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score,deliverable
|
||||||
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,28,0.0344827586206896,True
|
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,42,0.0232558139534883,True
|
||||||
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,48,0.0204081632653061,True
|
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,62,0.0158730158730158,True
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,38,0.0256410256410256,True
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,52,0.0188679245283018,True
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,21,0.0454545454545454,True
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,35,0.0277777777777777,True
|
||||||
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,160,0.0062111801242236,True
|
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,174,0.0057142857142857,True
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,64,0.0153846153846153,True
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,78,0.0126582278481012,True
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,13,0.0714285714285714,True
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,26,0.037037037037037,True
|
||||||
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,24,0.04,True
|
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,37,0.0263157894736842,True
|
||||||
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,205,0.0048543689320388,True
|
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,219,0.0045454545454545,True
|
||||||
|
|||||||
|
@@ -1,10 +1,10 @@
|
|||||||
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score,sensitive_flag
|
user_id_x,email,birth_date,gender,consent_flag,user_id_hashed,user_id_y,timestamp,page_depth,dwell_time,session_id,birthday_trigger,recency_index,rf_score,sensitive_flag
|
||||||
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,28,0.0344827586206896,False
|
4,dana@outlook.com,1992-05-30,F,True,4b227777d4dd1fc61c6f884f48641d02b4d121d3fd328cb08b5531fcacdabf8a,4,2026-01-25 16:45:00+00:00,3,60,401,1992-04-30,42,0.0232558139534883,False
|
||||||
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,48,0.0204081632653061,False
|
3,charlie@tempmail.com,2000-11-21,M,False,4e07408562bedb8b60ce05c1decfe3ad16b72230967de01f640b7e4729b49fce,3,2026-01-05 12:30:00+00:00,2,30,301,2000-10-21,62,0.0158730158730158,False
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,38,0.0256410256410256,False
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-01-15 09:15:00+00:00,5,120,101,1990-01-17,52,0.0188679245283018,False
|
||||||
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,21,0.0454545454545454,False
|
1,alice@gmail.com,1990-02-17,F,True,6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b,1,2026-02-01 10:20:00+00:00,3,90,102,1990-01-17,35,0.0277777777777777,False
|
||||||
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,160,0.0062111801242236,False
|
7,grace@no-reply.com,1993-03-15,F,True,7902699be42c8a8e46fbbb4501726517e86b22c56a189f7625a6da49081b2451,7,2025-09-15 17:10:00+00:00,2,45,701,1993-02-15,174,0.0057142857142857,False
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,64,0.0153846153846153,False
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2025-12-20 14:10:00+00:00,4,60,201,1985-06-05,78,0.0126582278481012,False
|
||||||
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,13,0.0714285714285714,False
|
2,bob@yahoo.com,1985-07-05,M,True,d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35,2,2026-02-10 08:50:00+00:00,6,180,202,1985-06-05,26,0.037037037037037,False
|
||||||
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,24,0.04,False
|
6,frank@gmail.com,1995-12-01,M,True,e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683,6,2026-01-30 09:00:00+00:00,4,120,601,1995-11-01,37,0.0263157894736842,False
|
||||||
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,205,0.0048543689320388,False
|
5,eric@mailinator.com,1988-09-10,M,True,ef2d127de37b942baad06145e54b0c619a1f22327b2ebbcfbec78f5564afe39d,5,2025-08-01 11:00:00+00:00,5,150,501,1988-08-10,219,0.0045454545454545,False
|
||||||
|
|||||||
|
BIN
src/models/__pycache__/churn_classifier.cpython-312.pyc
Normal file
BIN
src/models/__pycache__/churn_classifier.cpython-312.pyc
Normal file
Binary file not shown.
BIN
src/models/__pycache__/engagement_scoring.cpython-312.pyc
Normal file
BIN
src/models/__pycache__/engagement_scoring.cpython-312.pyc
Normal file
Binary file not shown.
BIN
src/models/__pycache__/predictive_segments.cpython-312.pyc
Normal file
BIN
src/models/__pycache__/predictive_segments.cpython-312.pyc
Normal file
Binary file not shown.
BIN
src/models/__pycache__/send_time_optimizer.cpython-312.pyc
Normal file
BIN
src/models/__pycache__/send_time_optimizer.cpython-312.pyc
Normal file
Binary file not shown.
30
src/models/churn_classifier.py
Normal file
30
src/models/churn_classifier.py
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
import numpy as np
|
||||||
|
|
||||||
|
def classify_churn(df):
|
||||||
|
|
||||||
|
inactivity_threshold = 30
|
||||||
|
drop_threshold = 20
|
||||||
|
|
||||||
|
df["engagement_drop"] = df["engagement_score"].diff().fillna(0)
|
||||||
|
|
||||||
|
df["churn_risk_flag"] = (
|
||||||
|
(df["days_since_last_session"] >= inactivity_threshold) &
|
||||||
|
(df["engagement_drop"] <= -drop_threshold)
|
||||||
|
)
|
||||||
|
|
||||||
|
def risk_class(row):
|
||||||
|
if row["churn_risk_flag"]:
|
||||||
|
return "High Risk"
|
||||||
|
elif row["engagement_score"] < 50:
|
||||||
|
return "Medium Risk"
|
||||||
|
else:
|
||||||
|
return "Low Risk"
|
||||||
|
|
||||||
|
df["risk_class"] = df.apply(risk_class, axis=1)
|
||||||
|
|
||||||
|
df["churn_probability_score"] = np.where(
|
||||||
|
df["risk_class"] == "High Risk", 0.8,
|
||||||
|
np.where(df["risk_class"] == "Medium Risk", 0.5, 0.2)
|
||||||
|
)
|
||||||
|
|
||||||
|
return df
|
||||||
46
src/models/engagement_scoring.py
Normal file
46
src/models/engagement_scoring.py
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
def calculate_engagement_scores(df):
|
||||||
|
|
||||||
|
now = pd.Timestamp.now(tz="UTC")
|
||||||
|
|
||||||
|
df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True, errors="coerce")
|
||||||
|
|
||||||
|
df["days_since_last_session"] = (now - df["timestamp"]).dt.days
|
||||||
|
df["session_count"] = df.groupby("user_id_hashed")["timestamp"].transform("count")
|
||||||
|
|
||||||
|
df["avg_session_duration"] = df.get("session_duration", 1)
|
||||||
|
df["click_engagement"] = df.get("click_through_rate", 0.5)
|
||||||
|
|
||||||
|
def normalize(series):
|
||||||
|
return (series - series.min()) / (series.max() - series.min() + 1e-9)
|
||||||
|
|
||||||
|
recency_norm = 1 - normalize(df["days_since_last_session"])
|
||||||
|
frequency_norm = normalize(df["session_count"])
|
||||||
|
duration_norm = normalize(df["avg_session_duration"])
|
||||||
|
click_norm = normalize(df["click_engagement"])
|
||||||
|
|
||||||
|
score = (
|
||||||
|
0.35 * recency_norm +
|
||||||
|
0.30 * frequency_norm +
|
||||||
|
0.20 * duration_norm +
|
||||||
|
0.15 * click_norm
|
||||||
|
)
|
||||||
|
|
||||||
|
df["engagement_score"] = (score * 100).round(2)
|
||||||
|
|
||||||
|
def classify(score):
|
||||||
|
if score >= 75:
|
||||||
|
return "Tier 1 - Highly Engaged"
|
||||||
|
elif score >= 50:
|
||||||
|
return "Tier 2 - Stable"
|
||||||
|
elif score >= 25:
|
||||||
|
return "Tier 3 - Declining"
|
||||||
|
else:
|
||||||
|
return "Tier 4 - At Risk"
|
||||||
|
|
||||||
|
df["engagement_tier"] = df["engagement_score"].apply(classify)
|
||||||
|
|
||||||
|
return df
|
||||||
21
src/models/predictive_segments.py
Normal file
21
src/models/predictive_segments.py
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
def build_predictive_segments(df):
|
||||||
|
|
||||||
|
# Default segment
|
||||||
|
df["predictive_segment"] = "General"
|
||||||
|
|
||||||
|
# At-risk + birthday approaching
|
||||||
|
df.loc[
|
||||||
|
(df.get("risk_class") == "High Risk") &
|
||||||
|
(df.get("birthday_within_30_days", False) == True),
|
||||||
|
"predictive_segment"
|
||||||
|
] = "At-Risk + Birthday Approaching"
|
||||||
|
|
||||||
|
# High-value users
|
||||||
|
df.loc[
|
||||||
|
(df.get("engagement_tier") == "Tier 1 - Highly Engaged"),
|
||||||
|
"predictive_segment"
|
||||||
|
] = "High Value - Amplify"
|
||||||
|
|
||||||
|
return df
|
||||||
21
src/models/send_time_optimizer.py
Normal file
21
src/models/send_time_optimizer.py
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
def optimize_send_time(df):
|
||||||
|
|
||||||
|
# Safety check
|
||||||
|
required_cols = ["timestamp", "click_engagement"]
|
||||||
|
for col in required_cols:
|
||||||
|
if col not in df.columns:
|
||||||
|
raise ValueError(f"{col} missing for send-time optimization")
|
||||||
|
|
||||||
|
df["hour"] = pd.to_datetime(df["timestamp"], utc=True).dt.hour
|
||||||
|
|
||||||
|
optimal_hours = (
|
||||||
|
df.groupby("hour")["click_engagement"]
|
||||||
|
.mean()
|
||||||
|
.idxmax()
|
||||||
|
)
|
||||||
|
|
||||||
|
df["optimal_send_hour"] = optimal_hours
|
||||||
|
|
||||||
|
return df
|
||||||
Reference in New Issue
Block a user