As an AWS partner, we receive a lot of email messages from AWS including many that need prompt attention. But with so many emails, it is very hard to detect which emails need immediate attention and which ones can be handled at a later time. To deal with the volume, I’d like to present a serverless system that sends an alert to a Slack channel whenever an urgent AWS email arrives.
Before we get started, here is the overall architecture of the system:
First, I trained an LDA (Latent Dirichlet Allocation) model using all previous emails and stored it in a S3 bucket. In the process of training, the model groups all previous mails into as many topic groups as specified, and you can review the generated topic groups to find out which group(s) should be considered to be alerted.
Now, a Lambda function regularly checks to see if there are any new email messages in the mailbox. If so, it sends the content of the new email to another Lambda function that feeds the content into the trained model. The model generates a proportion value for each group that shows how likely the email belongs to each group. Based on that information, the Lambda function sends an alert to Slack if the returned proportion values of groups to be alerted are high enough.
This system also provides a RESTful interface using API Gateway that invokes the same Lambda function to run the trained model against the given email content and sends an alert to Slack if the given email content is considered to be urgent.
Next, let’s take a look at the Jupyter notebook to go over how to train the model with scikit-learn library.
- Let’s create a DataFrame using all previous emails.
- Transform all email contents in the above DataFrame to a “document-term matrix” that represents how frequently each word is used in each document.
- Create a LDA model and train it with the above “document-term matrix”. Here, we choose 10 for ‘num_topics’ and set a fixed value for “random_state” seed to guarantee same results from repeated training.
- We can check topic groups generated by the model to see what words are mostly used in each group.
- Process the trained model against all emails to generate a proportion value for each group that shows how likely the email belongs to each group.
- Store the trained model in a S3 bucket for future use against new emails.
- Create another DataFrame using all emails with their generated group proportion values.
- Now merge two DataFrames, #1 & #7. You can see each email has topic groups (t00~t09) with their proportion values (p00~p09).
- This is a code snippet of how to use the trained model to get the topic group proportions against new email.
- The function “predict” receives input values of the trained model object and a new email message to be processed, and returns a list of topic groups along with their proportion values like below. It shows that the given email message most likely belongs to topic group 4.
- Finally, you can send the email as an alert to Slack if the topic group considered to be alerted has the highest proportion value. Here, “Possibility” is the proportion value of the topic group.
In summary, this is how to set up a serverless notification system that sends alerts to a designated Slack channel whenever an urgent email arrives in your mailbox. I used scikit-learn LDA model to train a model and picked up (a) specific topic group(s) for alert notification after reviewing the classification result. I stored the trained model in S3 bucket to use for new emails in a Lambda function, which supports serverless architecture.
Alex is a CTO Architect in the CTO Architecture Team, which is responsible for researching and evaluating innovative technologies and processes. He has been working at Sungard Availability Services for more than 14 years.
Before joining this team, he worked on developing public/private cloud orchestration platform and integrating various applications to automate the processes in managed hosting service companies including Sungard Availability Services. Prior to the managed hosting companies, he worked at Samsung for 9 years in developing a reporting tool and RDBMS engine.