Contributed by: Amit Agarwala
I have been working in the IT industry for 11 years. I started as a Functional Tester and then moved into Analytics Solution Validation 3 years ago. My primary job is to validate the data for its accuracy and Data Science models. Along with functional validation, I also do Business level validation to ensure that the model’s output resembles the realistic data. I am an enthusiastic person who loves to explore new data points, play various sports.
I work with transactional data of users across the US and have a product that helps users to compare their spending against the median spent( monthly, quarterly, half-yearly or yearly) among peers within their geography.
The spends are classified across different spend categories, which are further characterised by certain features of the users like Home Ownership, Income Range etc.
The business problem I had been facing was to validate if the numbers given by the product are close to the real numbers and do they make sense based on the user’s features (mentioned above). Additionally, it was important to check if the median spend for a category is a function of the Income Bucket, the Geography, Employment etc. Let me give you an example to get more clarity around this:
Let’s say if we have to check whether an Unemployed user would spend less on Restaurants than an employed user? Is the behaviour consistent across months?
If home rent in New York is higher than in Texas.
As the number of users present for each combination of spend category, geography, income bucket is not consistent and weights are used at certain places, it became important to ensure that any input data do not bias the calculated median spends.
As our team decided to launch a new product with this feature, it became very important to have numbers making sense. Hence, a detailed validation on numbers is much needed. It is also important to prove statistically and visually so that the reports can be shown to the leadership.
Finally, I decided to work around this problem, so as mentioned above, I started analyzing the data on different parameters. I then performed ANOVA and came up with very interesting insights, which I presented in front of the management in the form of Plots & Graph.
This Data Analysis was able to validate all the Hypotheses we have considered while building the product.
We could find the median spend across categories per geographical location and Income Group of a user. The numbers were found to be realistic with the actual numbers for top Cities and States.
Finally, the solution & recommendation that I proposed to solve the business problem gave high confidence to the Product Owner before rolling it out to Production. Now, the product has been receiving good feedback from across the clients to date.
To be precise, this Data Science @ work exercise has helped me in two primary aspects:
First, using methodological ways to validate trends and data are more acceptable by all stakeholders.
Secondly, it has boosted my confidence in using statistical methods to validate data and use visual methods to showcase the insights from the product.
If you wish to upskill, join Great Learning’s PGP Data Science and Business Analytics Course. For more such success stories, watch this space.0