Imagine that you are an important member of the actuarial staff of one of the country's largest insurance companies. Every year the company reevaluates pricing to ensure that they are able to meet obligations, cover expenses, and generate a profit. You are tasked with exploring a dataset which includes everything from demographics and history to coverage options. Your goal is to create a predictive model that can accurately predict insurance premiums.
You will need to carefully analyze all of the available features to build a model that is both as fair and as accurate as possible. If you are successful in making accurate predictions, the company can continue to offer competitive prices and ensure they have the resources needed to provide excellent services. It will also allow them to more fairly price premiums, avoiding excessively high premiums for certain groups.
Goal: The goal of this project is to build a regression model that can accurately predict insurance premiums.
The dataset for this competition (both train and test) was generated from a deep learning model trained on the Insurance Premium Prediction dataset.
id: Unique identifierAge: AgeGender: GenderAnnual Income: Annual incomeMarital Status: Marital statusNumber of Dependents: Number of dependentsEducation Level: Education levelOccupation: OccupationHealth Score: Health scoreLocation: LocationPolicy Type: Policy typePrevious Claims: Previous claimsVehicle Age: Vehicle ageCredit Score: Credit scoreInsurance Duration: Insurance durationPolicy Start Date: Policy start dateCustomer Feedback: Customer feedbackSmoking Status: Smoking statusExercise Frequency: Exercise frequencyProperty Type: Property typePremium Amount: Premium amount (target variable)Data Source: Kaggle Playground Series S4E12
Download DataYour task is to build a regression model to predict insurance premiums.
Python Libraries: Pandas, NumPy, scikit-learn, Matplotlib/Seaborn.