Customer Support Chatbot for Electronic Components

Narayana Darapaneni Director – AIML Great Learning/Northwestern University Illinois, USA

Gurender Singh Student – PGPAIML Great Learning Bangalore, India

Anwesh Reddy Paduri Senior Data Scientist Great Learning Hyderabad, India

Daniel D’souza Student – PGPAIML Great Learning Bangalore, India

Gaurav Kumar Student – PGPAIML Great Learning Bangalore, India

Sauvik De Student – PGPAIML Great Learning Bangalore, India

Santhosh G Student – PGPAIML Great Learning Bangalore, India

Abstract

Customer satisfaction is a key metric for any company providing products or services. A great deal of customer satisfaction is directly related to the support provided by the company to its customers. This paper describes an automated solution to achieving good customer support. The focus is on an e-commerce industry selling various electronics components such as sensors, micro-controllers and actuators. One of the key challenges is in the large variation in the types of queries and possible solutions. Different NLP based data exploration techniques are employed before moving on to exploring different model architectures. The solution described in this paper is based on a Seq2Seq LSTM model with Attention. The data from a private company is used and an evaluation accuracy of about 0.90 is achieved. However, this could be misleading since the dataset is very small, and additional evaluation metrics need to be measured and compared.

I. INTRODUCTION

Customer satisfaction is a key metric for any organization providing products or services as it is directly tied to the retention of existing customers and acquiring new customers to successfully grow. A great deal of customer satisfaction is directly dependent on the kind of support provided by the organization both pre-sale and post-sale. Excellent support directly translates to happy and satisfied customers that would be loyal to the organization’s products and services and at the same time would highly recommend the organization to their peers aiding in the organization’s growth. A great deal of customer support lies in service tickets and live chat support where-in customers expect quick answers irrespective of the time of day, attempting to achieve this manually has several drawbacks, the team required to handle all the service tickets within a satisfactory timeline would have to grow linearly with the customer base of the organization, additionally, multiple teams would need to be hired to handle rotating shifts, and to top it off the kind of work involved is quite menial which results in the manual team easily getting bored over time and unable to perform at their best as they face all kinds of customers in different moods. Also, a manual ream would be required to have the knowledge of all the organization’s products and services or the resolution time of the request is again affected by redirecting the customer between different departments thereby testing the customer’s patience and satisfaction.

To this end, a lot of work has been researched and done in the general domain of building both rule-based and automated chatbots to alleviate and handle the issue more efficiently. In this paper, we attempt to build an automated chatbot for an electronics-based e-commerce store based on past email conversations between agents and customers. Some of the supporting work on which this paper builds is described below.

In the paper, “Building a Question and Answer System for News Domain”, Sandipan Basu[1] et al. predicts answers, where passages are news articles and questions, can be asked against it. The SQuAD 2.0 (Stanford Question and Answer dataset has been used for training different model architectures. At a high level, the model consists of three layers - Embedding Layer, RNN Layer, and Attention layer. Various combinations of Embedding, LSTM, and attention layers have been compared and analyzed. The best performing model identified uses GloVe Embedding combined with Bi-LSTM and Context to Question Attention achieving an F1 Score of 33.095 and EM of 33.094. The transformer-based model using BERT achieved an F1 score of 57.513 and EM of 57.513. The model is more focused on factoid-based questions such as name, date, and location. There is an assumption made that the answer exists in the passage P always as a contiguous text string and the model returns the start and end tokens of the answer as the result.

In the paper, “Conversational Machine Comprehension: a Literature Review”, Somil Gupta[2] et al. studies and compares different CMC models emphasizing on recently published models specifically on the approach to tackle conversational history. It synthesizes a generic framework for CMC models and highlights the differences in recent approaches serving as a compendium for future researchers. It describes that the study of Conversational AI (ConvAI) systems is at the confluence of Natural Language Processing, Information Retrieval, and Machine Learning. It goes on to further explain that ConvAI consists of three major research problems - Question Answering (provide answers to queries through text snippets, web documents, knowledge base), Task Completion (accomplish tasks through information acquired from conversation) & Social Chat (emulate humans & converse seamlessly & appropriately with users, as in the Turing test).

In the paper, “Survey on Chatbot Design Techniques in Speech Conversation Systems”, Sameera A. Abdul Kadeer and Dr. John Woods [3] compares design techniques from nine carefully selected papers based on designing chatbots. It goes on to describe a chatbot being divided into 3 parts, the responder (plays the interfacing role between the bot’s main routines and the user), classifier (co-ordinator between responder and graphmaster - pre/post processing) and graphmaster (organising content, storage & holding pattern matching algorithms). It mentions the fundamental design techniques and approaches as Parsing, Pattern matching, AIML, Chat Script, SQL & relational database, Markov Chain, Language tricks and Ontologies.

In the paper, “Lingke: A Fine-grained Multi-turn Chatbot for Customer Service”, Pengfei [4] et al. presents Lingke, an information retrieval augmented chatbot which is able to answer questions based on a given product introduction document and deal with multi-turn conversations. It introduces a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.

In the paper, “Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network”, Zhou [5] et al. investigates matching a response with its multi-turn context using dependency information based entirely on attention. This solution extends the attention mechanism in two ways. First, it constructs representations of text segments at different granularities solely with stacked self-attention and second, it tries to extract the truly matched segment pairs with attention across the context and response.

In the paper, “Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots”, Yu Wu et al [6]. focuses on response selection for multi-turn conversation in retrieval-based chatbots. It proposes a sequential matching network to solve the limited methods of multi-turn conversation which work by either concatenating utterances in context or matching responses with a highly abstract context vector which may lose relationships among utterances or important contextual information.

II. OVERVIEW AND ANALYSIS OF THE DATA-SET

A. Overview

Our goal in this paper is to predict agent response for a given customer query in the electronics e-commerce industry. We used the dataset containing threads from customer-agent conversations in English, dimensioned by date and time based on support tickets created at Robocraze. We paired the query-response conversations starting with the customer query and then concatenating consecutive customer queries or agent responses. We applied a simple seq2seq LSTM model, further with Bi-Directional layer, and subsequently attention layer to predict agent response for a given customer query. We achieved a 90% validation accuracy on a smallish dataset of 296 tickets, involving 218 customers and 6 agents.

B. Detailed look at data characteristics

The dataset features information on 296 tickets. About 26% among them had no response from agents. This is either due to account creation emails with different platforms, drive share links or closing notes from customers. As noted from Fig.1, 78 tickets involve a single user (customer), that is, cases with no response from agents. The most common ticket resolution (~160) includes involvement of a single agent; while there is a single case where three agents are involved to interact with a customer. Typically it takes a single turn to resolve a ticket (~61% of tickets with at least one agent response); while the maximum number of turns involved to resolve a ticket is 11, albeit very rare (Fig. 2).

Fig. 1 and 2: Frequency distribution of user count per ticket and turn counts per ticket

Fig. 1 & 2: Frequency distribution of user count per ticket (Left) and turn counts per ticket (Right).

We also looked at the duration it takes to close a ticket. As noted from Fig. 3, most of the tickets get closed within 5-10 days of opening, with the duration averaging to 7 days per ticket, or about 4.5 median days. The distribution of duration looks right skewed with a typical ticket closure window between 5-10 days (median ~6 days). There are very few outlier cases when a ticket took nearly more than 100 days to close.

Fig. 3. Distribution of duration (days) to ticket resolution.

To analyze texts in the tickets, we looked at the distribution of word counts for customer queries and agent responses. Both the distributions look right skewed (Fig. 4 & 5). Customer queries with a median word count of 61, is typically longer than agent responses with a median word count of 41. In addition, we looked at the distribution of character counts for customer queries and agent responses. Both the distributions look close to normality more so for the customer one (Fig. 6 & 7).

Fig. 4 & 5: Distribution of length (word) in customer queries and agent responses

Fig. 4 & 5: Distribution of length (words) in customer queries and agent responses.

Fig. 6 & 7: Distribution of avg. no. of characters in customer queries and agent responses.

Next, we looked at the most commonly used words among customers and agents as depicted in Fig. 8 & 9. We excluded stopwords from the chat texts as defined in Python’s Natural Language ToolKit (NLTK) package, along with a few custom stopwords like ‘hi’, ‘hello’, ‘please’, ‘gt’, ‘lt’. Word ‘Ticket’ is the second most commonly used among customers; while most common among agents. In addition, we looked at most commonly used bi-grams for customers and agents, as shown in Fig. 10 & 11. Words ‘robocraze’ and ‘support’ typically appear together the most in customer queries; while words ‘new’ and ‘ticket’ appear together the most in agent responses.

Fig. 8 & 9: Top 10 used words in customer queries and agent responses.

Fig. 10 & 11: Top 10 used bi-grams in customer queries and agent responses.

Finally, we looked at the word cloud representation of texts from customer queries and agent responses. Fig. 12 & 13 below depict the same. The word cloud representation matches with the above common uni- and bi-gram texts for customers and agents, as expected.

Fig. 12: WordCloud for Customer texts.

Fig. 13: WordCloud for Agent texts.

C. Data Preparation

To prepare the data for the model, several steps were performed. The following fields were taken into further consideration - ticket_id, thread_type, created_by, message while all the other fields were kept aside. Regex was used for standard data cleaning with a focus on cleaning out html based content.

After cleaning the data, it was divided into query-response pairs. Historical conversations were added to the query itself because a lot of the conversation is heavily dependent on the past queries and responses. A <start> and <end> identifier was added to all of the responses.

Keras tokenizer was used to convert all the text sequences to tokens and then both the queries and responses were padded to match the maximum query or response respectively. The responses were converted to one hot encodings so that they could be used for modelling.

III. MODEL BUILDING

The model build is based on an encoder-decoder structure. The input layer for the encoder consisted of 2116 units (the maximum sequence length of the queries + history) while the input layer for the decoder consisted of 292 units (the maximum sequence length for the replies), it is much more smaller when compared to the encoder since there it only contains the direct replies and no historical data. Both the input layers were then given directly to two separate embedding layers of size 200 each. The query embeddings were then fed into an LSTM and the output of this LSTM layer was fed to another LSTM layer along with the decoder embeddings. Dot product based attention was used and the output was concatenated and given to a dense layer to generate the final model predictions.

Fig. 14: Model Architecture Overview.

Fig. 15: Model Layer Diagram.

IV. EVALUATION AND MODEL RESULTS

The model is definitely overfitting as gathered from the training accuracy of 0.99 and validation accuracy of 0.90 after training for 500 epochs. The other reason for such high accuracy levels could be due to the fact that it is a very limited data set. Below are some sample predictions of the model.

Example 1:
Input: I have 2 questions, do you have raspberry pi 4 in stock (4gb) and do you provide input tax credit (GST), pls let me know, I can make payment and book dunzo for delivery(local blore delivery).
Prediction: hi,pi4 4gb is in stock and we can provide you with gst if you share the gst number. also you can book dunzo for your order pick up, but please confirm with us on dunzo pickup timings

Example 2:
Input: I didn't like the product want to return my product
Prediction: hi,we have refunded for your order and hence we are closing this ticket currently, if any queries please feel free to raise a new ticket.thank you

Example 3:
Input: How to change brightness on raspberry pi official 7inch touchscreen ?
Prediction: hi,to modify the brightness you can try using the command given below:echo n /sys/class/backlight/rpi_backlight/brightnesswherenis some value between 0 and 255.for more information, you can check the thread in the link below:https://raspberrypi.stackexchange.com/questions/46225/adjusting-the-brightness-of-the-official-touchscreen-display

V. CONCLUSION & FUTURE SCOPE

As the above study was done within a very short span, this is by no means a comprehensive study of all the possibilities and there is still room for further improvement. There are several directions that can be further explored for improving the overall performance of the model. One of the most straightforward options would be to increase the dataset to include a lot more conversations (tickets) and a more suitable method of imputing or handling data that was missing.

On the model front itself, several possibilities can be explored such as exploring pre-trained word-embeddings, transformer-based models, self-attention, or pre-training on other standard datasets and then fine-tuning the model for this particular problem.

These are some of the possible directions that can be further explored along with tuning of hyper-parameters and regularization to reduce overfitting and improve the overall performance of the model.

As an additional data cleaning step, given that the contents are majorly mailed content, the greetings (Hello <name>, Dear <name>, etc.) and closing content (Regards, Thanks, etc. ) can be removed from the data and can be handled separately from the model.

REFERENCES

Sandipan Basu, Aravind Gaddala, Pooja Chetan, Garima Tiwari, Naryana Darapaneni, Sadwik Parvathaneni, and Anwesh Reddy Paduri, “Building a Question and Answer System for News Domain”, arXiv:2105.05744, May 2021.
Somil Gupta, Bhanu Pratap Sing Rawat, and Hong Yu, “Conversational Machine Comprehension”, arXiv:2006.00671, Jun 2020.
Sameera A. Abdul-Kader and Dr. John Woods, “Survey on Chatbot Design Techniques in Speech Conversation Systems”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 6, No. 7, 2015
Pengfei Zhu, Zhuosheng Zhang, Jiangtong Li, Yafang Huang, Hai Zhao, “Lingke: A Fine-grained Multi-turn Chatbot for Customer Service”, arXiv:1808.03430, Aug 2018
Zhou, Xiangyang & Li, Lu & Dong, Daxiang & Liu, Yi & Chen, Ying & Zhao, Wayne & Yu, Dianhai & Wu, Hua. (2018). “Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network”. 1118-1127. 10.18653/v1/P18-1103.
Yu Wu, Wei Wu, Chen Xing, Ming Zhou, Zhoujun Li, “Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots”, arXiv:1612.01627, Dec 2016
J Kim, HG Lee, H Kim, Y Lee, YG Kim, “ Two-Step Training and Mixed Encoding-Decoding for Implementing a Generative Chatbot with a Small Dialogue Corpus”, Proceedings of the Workshop on Intelligent Interactive Systems, 2018
J. Kapočiūtė-Dzikienė, “A Domain-Specific Generative Chatbot Trained from Little Data”, Applied Sciences, vol. 10, no. 7, p. 2221, Mar. 2020.
Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov; “Enriching Word Vectors with Subword Information”, Transactions of the Association for Computational Linguistics 2017; 5 135–146.
Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to Sequence Learning with Neural Networks”, arXiv:1409.3215v3, Dec 2014
Chayan Chakrabarti, George F. Luger, “Artificial Conversations for Customer Service Chatter Bots: Architecture, Algorithms, and Evaluation Metrics”, Expert Systems with Applications, Volume 42, Issue 20, 2015, Pages 6878-6897, ISSN 0957-4174. DOI:https://doi.org/10.1016/j.eswa.2015.04.067
Che Liu, Junfeng Jiang, Chao Xiong, Yi Yang, Jieping Ye. “Towards Building an Intelligent Chatbot for Customer Service: Learning to Respond at the Appropriate Time”, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, 3377–3385. DOI:https://doi.org/10.1145/3394486.3403390
Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao and Dan Jurafsky. “Deep Reinforcement Learning for Dialogue Generation”, arXiv:1606.01541v4, Sep 2016
Umang Gupta, Ankush Chatterjee, Radhakrishnan Srikanth, Puneet Agrawal. “A Sentiment-and-Semantics-Based Approach for Emotion Detection in Textual Conversations”, arXiv:1707.06996v4, Mar 2014.
N. Darapaneni et al., “Movie success prediction using ML,” in 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), 2020, pp. 0869–0874.

Explore More Research and Studies