Allstate Purchase Prediction Challenge

1. Problem Statement

Imagine that you're a key consultant hired by Allstate, one of the nation's largest insurance providers, to help them understand and improve their customer purchase behavior. The company recognizes that the insurance landscape is becoming increasingly competitive, with customers having more choices than ever before. Allstate wants to leverage data science to enhance customer relationships, tailor product offerings, and ultimately increase sales. However, they realize this is difficult given the complexity involved.

You are tasked with analyzing customer data – demographics, past interactions, policy quotes, and purchase history – to build a model that predicts which coverage options customers are most likely to select. By identifying the factors that drive customer choices, you will be able to provide Allstate with actionable insights for personalizing marketing messages, streamlining the quoting process, and developing new product bundles that better meet customer needs. Your success will translate into increased customer satisfaction, improved retention rates, and a significant boost to Allstate's overall sales performance.

Goal: The goal of this project is to build a model that can predict the purchased coverage options using a customer's shopping history.

2. Data Description

The dataset contains transaction history for customers who ended up purchasing a policy. It includes customer information, quoted policy information, and costs. The training set contains the entire quote history for each customer, with the last row containing the purchased coverage options. The test set contains a partial history of the quotes, and the task is to predict the purchased coverage options.

Data Source: Kaggle Allstate Purchase Prediction Challenge

Download Data

3. Your Task

Your task is to predict the seven coverage options (A, B, C, D, E, F, G) that each customer will end up purchasing.

  1. Data Exploration and Preprocessing:
    • Load the dataset using Pandas.
    • Explore the data to understand the distribution of features.
    • Handle missing values, if any.
    • Convert categorical variables to numerical format.
  2. Feature Engineering (Crucial):
    • Aggregate features from the shopping history.
    • Create lag features to capture the customer's decision-making process.
  3. Model Building:
    • Split the data into training and validation sets. Consider using group-based splitting.
    • Choose a suitable multi-label classification algorithm (e.g., Random Forest, Gradient Boosting).
    • Train the model on the training data.
  4. Model Evaluation:
    • Evaluate the model's performance on the validation data.
    • Use appropriate metrics for multi-label classification.
  5. Prediction and Submission:
    • Make predictions on the test data.
    • Format the submission file according to the competition guidelines.

Python Libraries: Pandas, NumPy, scikit-learn, Matplotlib/Seaborn.