This article explains the features that are generated after the Engine processes the data in a subscription option in the customer churn prediction app.
Features are known attributes used as input by machine learning models to predict the unknown target.
For churn prediction, the Engine automatically generates a number of useful features from the transactions data and the customer information data provided by the user. These features represent various customer behavior statistics over different periods of time.
Different types of aggregated features are generated over the selected time windows:
-
Events-based features: minimum, maximum, standard deviation and total for the selected attributes of logged events
-
Count-based features: counts for the event attribute
-
Time-interval based features: minimum, maximum and average number of days between events
-
Recency features: days since last event
-
Additional features from the customer info data and subscription data
The user can select the time windows used to generate these aggregated features (we regard these features as “contributing factors”) when they are defining time based factors from the events datasets:
Specifying time windows (bottom) for computing contributing factors (upper) from a streaming sessions events logs dataset
As an example, assume we have an imaginary streaming company called PetFlix.
The company saves viewing streaming session logs of pet videos viewed. It also has some information about its customers (assuming all of them are active).
Subscription information:
User ID | sign_up_date | churn_date |
1 | May 1, 2023 | |
2 | May 1, 2023 | |
3 | May 1, 2023 |
Streaming logs: (we give this dataset a nickname "streaming_sessions")
User ID | Date | session_length_minutes |
1 | May 29, 2023 | 20 |
2 | May 30, 2023 | 30 |
3 | May 30, 2023 | 40 |
4 | May 30, 2023 | 20 |
Customer information:
User ID | Gender | Date of Birth (DOB) |
1 | F | Oct 23, 1999 |
2 | F | Apr 17, 1983 |
3 | M | Feb 29, 1984 |
Then, the following features will be generated by the engine. For the event activity based features, the suffixes such as “_last_30d” and “_last_15d” correspond to the rolling date ranges confirmed by the business user while selecting the time-based contributing factors.
Description | Feature name | |
Events-based features | Count based |
count_of_streaming_sessions_last_15d count_of_streaming_sessions_last_30d |
Stats from event attributes |
min_session_length_minutes_in_streaming_sessions_last_30d min_session_length_minutes_in_streaming_sessions_last_15d max_session_length_minutes_in_streaming_sessions_last_30d max_session_length_minutes_in_streaming_sessions_last_15d total_session_length_minutes_in_streaming_sessions_last_30d total_session_length_minutes_in_streaming_sessions_last_15d stddev_session_length_minutes_in_streaming_sessions_last_30d stddev_session_length_minutes_in_streaming_sessions_last_15d |
|
Interval between events |
min_days_btw_events_in_streaming_sessions_last_30d min_days_btw_events_in_streaming_sessions_last_15d max_days_btw_events_in_streaming_sessions_last_30d max_days_btw_events_in_streaming_sessions_last_15d avg_days_btw_events_in_streaming_sessions_last_30d avg_days_btw_events_in_streaming_sessions_last_15d stddev_days_btw_events_in_streaming_sessions_last_30d stddev_days_btw_events_in_streaming_sessions_last_15d |
|
Recency | days_since_last_event_in_streaming_sessions | |
Features from subscription start and end dates dataset | Tenure (days since subscription start date) | |
Time based features from customer info data |
year_of_dob month_of_dob week_in_year_of_dob weekday_of_dob days_since_dob |
|
Other features from customer info data | gender |