Browse by Domains

These solutions are time-saving – Abhishek Gupta, Data Science @ Work

Table of contents
ds at work

Contributed by: Abhishek Gupta

My Name is Abhishek Gupta. I am working as Senior Developer and Subject Matter Expert with one of the leading Consultancy Services Pvt Ltd. I have 7 years of experience as Informatica Developer (ETL Developer) and perusing part-time PhD in Computer Science and Engineering, having main area of research as data warehouse, just 2 months ago I have presented one of my research papers in IEEE International Conference held by University of California, Irvine. So, during work I was searching some automations that can be performed on the workplace where after discussion with team we identified that there are following manual approaches to solve some specific tasks which are highly time consuming, and found 3 pain-areas and worked on the same:

  1. DBA’s have to check database schema consumption and then asks the developers to delete the tables followed by multiple recursive follow-ups all these things happen manually.
  2. Manual generation of the prototype queries for Client send the STMs (Requirements).
  3. Manual validation of report data before sending it to Data Analyst.

For solving this problem, I have used Python, Netezza DB, Unix and Informatica ETL tool. I have done Statistical Data-Analysis for all above maintained 3 points and derive the Algorithm for the same as below which includes solutions for the above given problem in automated format:

  1. Gathering statistics for database schema, that a mail should be triggered when database schema reaches 10% below the threshold value & intimate it to the DBA that schema is about to reach the threshold value. When it reaches to threshold, trigger mail to developers to select the tables they posses to drop them and developers have to just reply to the mail and the replied mail will be read and will be dropped from the schema automatically, if the schema reaches below the threshold than don’t perform any further actions else repeat the steps, until it reaches below threshold.
  2. Changing cell of the STMs (STMs are in form of csv) into procedural language (like case statements) that can be further used to plug-in to the prototype queries to retrieve outputs for generating faster prototype queries.
  3. Validation of report generated by ETL objects and predicting the accuracy of data before sending it to the Data Analysts for validations.

IMPACT: These solutions have been implemented and it is saving a lot of time in terms of man-hours. It’s not a one-time man-hour saving it is recursive in nature. Apart from that, I have received off-cycle salary increment (other than normal salary increment cycle) of 70k+ due to outstanding performance (Completed within 1 months of time).

Want to learn such concepts? Join Great Learning’s PGP Data Science and Business Analytics Course and upskill today!

Avatar photo
Great Learning Team
Great Learning's Blog covers the latest developments and innovations in technology that can be leveraged to build rewarding careers. You'll find career guides, tech tutorials and industry news to keep yourself updated with the fast-changing world of tech and business.

Leave a Comment

Your email address will not be published. Required fields are marked *

Great Learning Free Online Courses
Scroll to Top