Imagine that you are a data scientist for a big marketing firm, hired to improve your client's sales by developing a powerful predictive model. One of your largest clients is the Google Merchandise Store, where they sell Google swag and various digital devices. The store wants to better understand where they are earning (and not earning) their revenue and what factors drive the highest revenue, but their current analytical methods are too high-level to provide actionable insights. The CEO of the store has high hopes for your abilities.
You are given a huge dataset derived from Google Analytics, filled with customer interactions and behavior. Your primary task is to build a model that can accurately predict revenue per customer, identify key drivers of revenue, and pinpoint customer segments with high potential. These efforts will allow you to provide actionable recommendations to the marketing and product teams. It is important for the company to understand their customer interactions and optimize their strategies.
Goal: The goal of this project is to build a model that can accurately predict revenue per customer.
The dataset contains customer interactions and transactions from the Google Merchandise Store. It includes information about visits, devices, geography, traffic sources, and more.
fullVisitorId: Unique identifier for each userchannelGrouping: Channel via which the user came to the Storedate: The date on which the user visited the Storedevice: Specifications for the device used to access the StoregeoNetwork: Information about the geography of the usersocialEngagementType: Engagement typetotals: Aggregate values across the sessiontrafficSource: Information about the Traffic Source from which the session originatedvisitId: Identifier for this sessionvisitNumber: The session number for this uservisitStartTime: The timestamphits: Row and nested fields for all types of hitscustomDimensions: User-level or session-level custom dimensionsData Source: Kaggle GA Customer Revenue Prediction
Download DataYour task is to predict the natural log of the sum of all transactions per user.
Python Libraries: Pandas, NumPy, scikit-learn, Matplotlib/Seaborn, json, XGBoost/LightGBM.