Today social media is an integral part of our life, and it’s hard to imagine life without social media. Social media has grown leaps and bounds. As more and more people use social media, there is an increase in the trend of offensive content, which can put the social media platforms into controversy and often in legal issues. Hence, social media platforms have a bigger responsibility for moderating the content that gets published on their platform.
As part of our PGP-CC capstone project, we came up with an innovative approach to ease the overhead of content moderation for social media platforms. This solution tries to demonstrate how AWS cognitive services can be leveraged to automate content moderation.
This solution proposes the usage of following AWS cognitive services for content moderation:
- AWS Comprehends to do the sentimental analysis of the blog texts. If the sentiment of the text is reported as negative with high confidence, then the blog will have a high probability of containing offensive/inappropriate content. Such blogs can be further evaluated for the usage of restrictive words, and if such words are found, the blog can be either auto rejected or put for manual moderation, depending on the requirement.
- AWS Rekognition to moderate the blog photos to identify any inappropriate content. If Rekognition reports any moderation categories with a high confidence level, then there are high chances that blog photos contain inappropriate content. Certain moderation categories (from Rekognition) can be auto rejected (like explicit, violence etc.) and certain categories can be pushed for manual moderation (like suggestive, hate symbols etc.)
- AWS Rekognition to analyze the photos for the presence of celebrities in the blog photos. If any inappropriate content is found with celebrity pictures, then there is a high chance of creating chaos. Moderation rules (text sentiment analysis confidence score & photo moderation analysis confidence score) can be adjusted to have stricter conditions.
Apart from the above primary objectives, this project demonstrates following aspects:
- AWS Translate to translate the blog text
- AWS Lex to build text chat bot support system in to the web app
- Lambda functions for all backend processing
- API gateway to provide API interface for the Lambda functions
- S3 to store blog photos and host static website
- DynamoDB to store blog text
- Aurora MySQL DB to store blog metadata
- Cognito for user management and authentication
- SNS for sending Email notifications
- Cloud front distribution to enable faster content delivery
- Route 53 to route traffic coming to DNS address to cloud front distribution
- VPC endpoints to enable routing of traffic between Lambda function deployed in VPC with SNS, DynamoDB, S3, Comprehend, Rekognition
Considering that resilience and fault tolerance are of high importance now a days, cross-region data replication has been implemented for following:
- S3 photo store
- DynamoDB blog data
- Aurora MySQL blog metadata
High level architecture
Some of the key use cases are detailed out here to give insight into the implementation details
|1||Store blog metadata in Aurora MySQL|
|2||Store blog text in DynamoDB|
|3||Get S3 pre-signed URL for uploading the photo|
|4||Call S3 pre-signed URL to post the photo from the blog app|
Blog text moderation
|Process blog text|
|1||DynamoDB trigger calls text processor lambda APILambda API process the record if it’s a new record creation event|
|2||Do the sentimental analysis of the text using Comprehend If the blog sentiment is negative and contains any restricted words, mark blog for manual moderation, else mark as text approved for public access|
|3||Store the sentimental analysis results from Comprehend against blog in DynamoDB|
|4||Update the blog status in Aurora MySQL, based on text analysis in step 2|
|5||If blog text requires manual moderation, send message to SNS to trigger an Email to admin|
Blog photo moderation
|Process blog photo|
|1||S3 event calls photo processor lambda API, when new photo added in the bucket|
|2||Validate the photo for restrictive content using content moderation capability of Rekognition. Based on moderation results, blog photo approved/rejected/marked for manual moderation.Identify the celebrities using celebrity recognition capability of Rekognition. If no celebrities found, identify the entities in the picture using entity identification capability of Rekognition.|
|3||Store photo moderation results and celebrity/entity identification results from Rekognition against the blog in DynamoDB|
|4||Update the blog status in Aurora MySQL, based on photo moderation results in step 2|
|5||If blog photo requires manual moderation, send message to SNS to trigger an Email to admin|
|1||Call lambda to fetch the list of blogs eligible for public access|
|2||Get the blog details for each of the eligible blog|
|2a||Get the blog text from DynamoDB|
|2b||If user selected different language (other than English), translate the text using Amazon Translate|
|2c||Get the S3 pre-signed URL for the blog photo|
|3||Fetch the blog photo using S3 pre-signed URL|
Admin moderation of blog
|1||Call lambda to fetch the blog that require manual moderation|
|1a||Fetch the blog pending for manual moderation from Aurora MySQL|
|1b||Fetch the pre-signed S3 URL for the blog photo|
|1c||Fetch the blog text from the DynamoDB|
|2||Blog app fetches photo from S3 using pre-signed URL|
|3||Call lambda to either approve/reject the blog based on admin action|
As part of our capstone journey, we came across many challenges which enabled us to get more insight into the AWS cloud. Here are few key takeaways:
- Versioning should be enabled on S3 to enable cross region replication
- Source S3 bucket should be granted access to replicate in destination region S3 bucket via IAM role
- Streaming should be enabled in DynamoDB for enabling global table feature
- Free tier provisioned capacity will not support global tables in DynamoDB
- Cross-region read replicas not supported for MySQL RDS
- RDS proxy not supported for Aurora RDS instances
- Default DB parameter group doesn’t support cross region replication of Aurora RDS
- VPC endpoints enable access to S3, DynamoDB, Comprehend, Rekognition and SNS for Lambda inside VPC
- Custom classification features of Comprehend will support classification of content within predefined sets of labels. Even if a content doesn’t match any label, it will map to one of them. So it is suitable to use when the scope of content is limited to certain context.
- Custom classification endpoint will be charged even when not in use.
- Before enabling RDS cross-region replication, RDS subnet should be created in the destination region.
- RDS instances will be restarted automatically after 7 days if they are not started manually.
- If Lambda doesn’t have necessary rights to access S3, still pre-signed url will be provided by S3 for object access on request. However, the URL access failed.
- S3 will return a pre-signed URL for fetching an object, even if it doesn’t present in the bucket. This needs to be handled at the client side.
- AWS authentication enabled DB users can be used to access Aurora MySQL RDS instances from Lambda.
- CORS should be enabled at the API gateway to enable access of API’s from web apps.
- Lambda will require appropriate rights for accessing Comprehend, Rekognition, Translate, SNS etc. Which can be granted by assigning appropriate policies to the execution role attached to the Lambda.
- Additional dependencies required for Lambda like MySQL connector library can be added via Lambda layers.
- To stop/delete a read replica cluster of RDS, it has to be first promoted as stand-alone.
- To associate alternate domain names to cloud front distribution, you need to get a custom wildcard security certificate and then associate the DNS with cloud front distribution.
- To add routing logic to a cloud front distribution from route53, DNS name should be configured as alternate domain name in cloud front distribution and associate the custom security certificate.
- Custom security certificates for the DNS can be created using Amazon Certificate Manager and the certificate needs to be added as a CNAME record in the DNS.
Ashok K A Setty – More than 17+ years of experience in software development in Aerospace, Telecom and Supply Chain domain. Currently working in Boeing India as a Technical Lead for Cabin System software development.
Vijay Rajagopalan – Overall 20+ years of experience in the IT industry. Currently working as Senior Manager with Hitachi Vantara managing a global centralized Marketing Center of excellence team to drive initiatives around Delivery excellence , Process adherence and Standardization.
Maheshkumar Rajagopalan – Overall 14+ years of experience in Technical support and Project management. Currently working with Sabre as a Supervisor Service Delivery managing their Asia Pacific operation for End user computing , Unified communication and Office infrastructure.1