Identifying Customer Accounts and Their Classification Using Python and NLP

Identifying Customer Accounts and Their Classification Using Python and NLP

By Great Learning Updated on Apr 4, 2023 121

Hi, I’m Bhavin, Senior Director – Application Development and Operations for a global capability centre of a Fortune 250 company which caters to retirement solutions in North America and EMEA regions. Having a knowledge of data science and business analytics tools will further aid in bringing more value to what I do for my firm. As a part of our endeavour to provide a better client and advisor experience, we had a need to add “nicknames” for clients’ accounts on our correspondence so that they can easily identify their accounts, in case they have several accounts with the financial institution, which is a very common scenario. To easily identify these accounts on online portals as against account numbers, clients can nickname their accounts. Accounts having nicknames will have correspondence delivered to them with their account nickname displayed on it. As it’s a free text field where clients can use their imagination in setting up nicknames, we run a risk of them using profane and abusive/inappropriate nicknames.

I partnered with multiple teams within my firm and also involved a data scientist who has experience in developing models and performing data analytics. We collaborated and used Natural Language Processing algorithms. As nothing ready-made was available, it required research across communities to find out whether similar models were developed. We came across a list of profane words, which we fed through as input to the model and it allowed nicknames with these profane names to be tagged. We also found an open-source model, which we then checked with architects within the firm for eligibility of the usage and created another model. Combining the results from both the models and fine-tuning for the accuracy of the matching words (or sub-words), we were able to flag almost 1,000 potential violators amongst 600,000 nicknames.

Python was the tool used to execute the models. We have a group of data scientists who work on different models and using firm-approved development tools, they were able to build the Models. This is how we worked towards problem resolution. Started with the located full extract of nicknames from its repository, and researched external community portals for clues on the resolution of a similar problem. Thereafter, we identified a list of profane words which was available on the internet as an open-source to use and also identified a model which was again open-source on the internet and secured approval to use the model in a live production environment through Information security protocols of the firm. By using two distinct approaches, created a joint model which would take results from both models and flag potential violators. This eliminated the need for reviewing 600,000 names manually and instead shrunk the list to only 1,000 violators. Furthermore, we reviewed the list of violators and used it to further fine-tune. For example, Dick is a very common nickname for males in the US. Dick’s IRA account was flagged as a violator. Finetuned the model to exclude “dick” as a profane word.

There were many useful insights derived from this process – 2 models were created, established efficiencies of these models to flag partial or full words as flagged words, and 600,000 nicknames were passed through this model. This resulted in 1,000 potential violators, which could be very easily reviewed. These violators will be entered through our maintenance process in order to stop them from being shown in correspondence to our clients thus making correspondence clean. The next steps were to identify the accounts which were violators and work with the client’s advisor to reach out to clients and have them rectify the nicknames. The compliance and supervision group of the firm would further review the accounts and client’s actions for further flagging the ones which are potential anti-fraud cases.

The key recommendations were that these models will be implemented in production to execute as a batch process to identify violators and flag the accounts with profane nicknames. Secondly, through a service interface, these models will be integrated into the front-end portal to stop clients from entering profane names and avoid bad names getting into the system. We were to witness impact in the form of:

1. Displaying nicknames in correspondence is a critical client experience initiative.
2. In absence of having cleaner names, compliance and legal groups won’t have approved the initiative and we would have deprived the clients of something which enriches their experience on day to day basis.
3. Build a maintenance process, which will flag the incoming violators.
4. Firms’ exposure with regulatory bodies in lieu of profane names showing up on regulatory correspondence of the firm.
5. Protecting the firm’s brand, which is considered the most premium goal in any financial industry.

The exercise helped me get exposed to peer groups within the firm, who work on data science and have the desired skill set. It also opens up an opportunity in future to work on complex data science models and partner on a few more such use cases and solutions for it.