Browse by Domains

Validating Requirements and Defining Working Models Using EDA and PCA Techniques

Table of contents

I’m Sriram, Senior Software Engineer with the important role of analysing the Business w.r.t Data. My day-to-day routine is to get in touch with Business Analyst(s) to validate the requirement and convert them into a working model in Database. I work closely with DevOps Teams for deployment. I got curious when Business Analysts say only some set of features is enough. Moreover, the features they mention don’t impact or have a minimal impact on existing features in a hierarchical format so that it could work properly. Also, I’ve noticed their intuition for making dashboards or data representation. This motivated me to learn Business Analytics.

When Business Analyst asked me to extract the important features which drive a particular logic along with their weightage. I did that from two perspectives. They are,

  • 1. Data Science Approach: I went for PCA and SelectKBest algorithms for getting the best result.
  • 2. Logical Approach: Since I know the business, I have manually given feature weightage.

Earlier we have to manually put weightage for every feature, now the effort has been reduced to seconds. I utilised data sources in the form of Oracle and analysed using EDA, PCA and SelectKBest. I am supposed to get the best 10 features which highly influence business logic. So, I went for PCA and SelectKBest. 

I queried the data from the database. As per Business, we shouldn’t have NULL values and validated that. Converted every feature to numeric or Numeric Categorical to easily (EDA) and passed inputs to Machine Learning Algorithms. There were several issues faced during this while:

1. Without Normalizing the data, got absolutely improper data because each feature was on a different scale.

2. On performing EDA, I found most of the data are testing data (i.e) used for a random check. Without understanding that, I performed my EDA. So I have to have a touch-base with Business Analysts for valid data.

3. For some features, there are irrelevant data compared to the Low-Level Document. So understanding the feature(s) felt very important.

4. Since I had huge data, the time consumed by each model was very high. So I have to learn some advanced ML models.

On doing that, I find around 50% of Features are highly co-related to other features and displayed in the Scatter plot to Business Analyst and mentioned I will be removing one of those features in order to get the optimized result. Business Analyst people also validated and informed to proceed. Also, with help of that, they are planning to modify the Existing Database Architecture to a more efficient one with this model. I have put Regression Models for further implementation. Also, I got an Idea to quantify the data by logical business grouping along with assigning scores.

The efforts put in have reduced the effort from days to hours (from 4 days to 5 hours). I also transferred key insights on how the Business Analyst can carry out his own analysis just by passing the required features and interpreting results. The exercise helped me to get real-time implementation to implement data science by knowing the business. In addition, I gave a new perspective to the dashboard data. Further to this, the interaction with business people increased gradually.

Avatar photo
Great Learning
Great Learning's Blog covers the latest developments and innovations in technology that can be leveraged to build rewarding careers. You'll find career guides, tech tutorials and industry news to keep yourself updated with the fast-changing world of tech and business.

Leave a Comment

Your email address will not be published. Required fields are marked *

Great Learning Free Online Courses
Scroll to Top