Here is a report on a cloud computing project on sentiment analysis and invoice management system by Sunil Vadapalli, Virender Yadav, and Rambabu Donthi Boyina.
For every company, start-up or a large enterprise, social media is a premier platform to promote their business. Marketing their new products and gaining more customers through social media has grown over the last decade and has now become indispensable. Businesses also get product reviews, service reviews, or user reviews through social platforms. Linkedin, Facebook, Instagram, and Twitter have been the leading platforms to build brand awareness. Capturing the data from these platforms helps businesses to get real-time analysis and understand the sentiments of the customers. Few such examples can be seen below:
The Capstone Project
The Invoice management system of our client works by sending invoices through emails to the accounts team, or by uploading them through their legacy application. Their application is not very robust and has many loopholes that create major problems for the teams at the end of the year for analysis. Accounts teams end up feeding all the client invoices manually into other systems after reviewing all the data from the legacy application. There are nearly 100 invoice formats that need to be loaded every month. The accounts team spends more time in analyzing these different formats and working on the different requirements for budget planning or reporting. All this leads to data inconsistency, and data availability has also become a challenge for the client.
Along with the above legacy application, the client also has internal applications to capture sentiments of the internal stakeholders through different surveys but don’t have any means to capture real-time sentiments/statistics from different social media channels for external stakeholders like shareholders, customers, and others.
For managing the invoices, we have tried implementing Amazon S3, SNS and Glue services. We used the Kinesis streams and Kinesis data analytics to capture the real-time data for Sentiment analytics. S3 and Glue are also used here for managing the streaming data after analysis. We used RDS to store the data and Athena for generating data sets to visualize through Amazon QuickSight.
Note: For testing purposes, the real-time data generation can be done using AWS Lambda and Cloud Watch event triggers. However, the cloud watch event triggers can be scheduled only to a minimum of a 1-minute interval. In order to increase the flow of the stream, the AWS Step Function has been used. This step function triggers the iterator Lambda function which in turn triggers the actual Lambda function that generates the data for the stream.
Role of cloud services:
S3 – Used to gather invoice data from the client and to gather streaming data from Kinesis.
SNS – To send an acknowledgment after receiving the invoice using different endpoints.
Glue Crawlers – To pull the data from S3 buckets into the Glue tables.
RDS – Above invoice data is loaded into RDS which can be accessed from the application.
Athena – For querying the data and for preparing data sets.
QuickSight – The data sets that are prepared will be visualized to top executives and PMO.
Kinesis Streams – To capture real-time data for analysis.
Kinesis Data Analytics – Analyse the streaming data generated by stakeholders, company employees on social media handles on Twitter, Linkedin, Facebook regarding revenue growth, newly deployed functions, achievements, and concerns.
Kinesis Firehose – To push the data from Streams to S3.
Lambda – To create streaming data internally.
CloudWatch Events – To trigger the Lambda function every minute. (That is the least that can be achieved using the Events)
Step Functions – To trigger the CloudWatch Events for more number of times in a minute in order to generate decent streaming data in a minute.
Invoice management system:
All the invoices received are stored in the S3 bucket and an acknowledgment would be sent to the User once the invoice is received. In one flow, the invoices are pulled into Glue tables using Crawlers and this data is again pushed to RDS using ETL jobs in Glue. The other flow pushes the data into Athena to generates the data set which can be visualized through Amazon QuickSight.
In the above scenario, a product review is considered for analysis. Two flows were created, one flow pushes the complete data into the database. However, the other flow is through Kinesis Analytics, wherein the data is filtered based on the requirement. For example, all the negative reviews such as “Poor”, “Not satisfied”, etc. are considered and this report is sent to the Athena to generate the data set which can be visualized through Amazon QuickSight. It helps in taking important decisions on the business flow.
This same business flow can also be used in many scenarios like polling during sports, political elections, etc.
A custom invoice management application is used to read the data from the database. This application is deployed on EC2 instances along with load balancers. It will be used by the accounts teams around the globe to generate the reports required for their respective regions.
While working on this project, cloud computing helped to understand many functional flows easily. The serverless concept helped to implement many services and experience functional flows practically. Monitoring the costs and keeping an eye when transferring the data from one region to another is necessary. This platform can help in understanding and building resilient architectures by practice.
This capstone project is a part of the PG program in cloud computing. Reach out to us if you are interested in pursuing this course with Great Learning. Read more about cloud computing technologies, applications, and career prospects in our blog section.
About the Authors:
Sunil Vadapalli – Sunil started working as a Cloud Engineer and developed an interest in Cloud Computing Architecture and Networking. He has got 8 years of experience as QA but now he is shifting his profile completely towards cloud computing.
Virender Yadav – A.G.M (Server and Database Technologies) at Hinduja Global Solutions with 14+ years of experience in Server and Database technologies(Oracle/SQL Server/MySQL/Informix/Postgress), Linux Suse/Redhat/Open Source, IT Infra Solutioning and Implementation, Production Support and Performance Optimization, Dataware house Design and Implementation.
Rambabu Donthi Boyina – Rambabu has got 15.3 years of experience in Windows, VMware, Active Directory, Security group policy, Citrix and scripting vb.net. He is working for FIS as Sr.Systems Engineer.