V2X in Urban Environments
One of the popular tasks solved inside Deloitte showcasing the solution using catboost library for a multiclass classification problem. This case is a common among data-driven retail businesses forecasting possible points of sale.
In this version of a task, several datasets have been acquired. The first one shows the data about customers shopping at existing points of sales (PoS) in major airports across EU. The data consist of >110k rows each representing an interviewed person, his personal data and the information about the purchase made. For this post the data has been depersonolized, so it can be shown with the most of the features available. Following features are included: <ul> <li></li> <li></li> <li></li> <li></li> <li></li> </ul> </div> <div align="center"> Second dataset consists of ~120k rows with almost the same interview questions exluding the category of the purchase made. This is because these interviews were conducted in several airports of interes, where the client wants to consider opening a point of sale. <br /> Abovementioned means that the dataset is identical to the previous one but expludes the information about purchases </div> <div align="center"> <h2>The Task </h2> The task is simple - help the client to predict based on the data which of the considering outside the EU airports can be taken for opening a new PoS. </div> <div align="center"> <h2>The Approach </h2> From a first sight, the task was considered as multiclass classification problem. Which means, that every category can be viewed as an average amount spent with a certain purchase. Through this, the closest to the optimal solution would be to predict the most profitable airport based on features provided in the first dataset and applied to a second one. </div>