Tweets by Donnie - Building a serverless sentiment analysis application with the twitter streaming API, Lambda and Kinesis

tweets-by-donnie dashboard

 

Thinking of Trumps tweets it’s pretty obvious that they are controversial. Trying to gain insights of how controversial his tweets really are, we created tweets-by-donnie.

“It’s freezing and snowing in New York — we need global warming!”
Donald J. Trump

You decide if it’s meant as a joke or not.

But wouldn’t it be nice to know whether the public is seeing this as a joke or whether it’s getting upset by it? That’s where our idea originated from. By measuring the emotions presented in the responses we can see what the public is thinking of Trumps posts throughout the day.

To generate such insights we decided to create a cloud architecture that can deal with a stream of tweets, enrich them and finish it all up with a simple API to query the results.

Easier said than done.

Home is where your IDE is, right? Writing code in the AWS console wasn’t a thing we felt good about. Also, it’s not reproducible in any way, which is why we chose the serverless framework. It bridges the gap from code in your IDE to the cloud. First we were overwhelmed by the technologies as these were our first steps building anything in the cloud. We never heard of AWS Cloudformation and never touched yaml files before but this seemed the way to go and it turned out to be very handy having all code and configurations checked in a repo. This way changing, recreating, or deleting code (or even the whole architecture) is a breeze. Check out our repo, fork it and try it yourself.

The serverless.yml file acts as your main description of your architecture. It can go from a single lambda function to a whole directory containing separate yaml files for different purposes like functions, resources, roles… you name it.
Speaking of roles it’s easy to maintain the least privilege principle with serverless. You can create a role per serverless.yml or go as far as creating a role per function.

Another good thing was creating the Resources on the fly. We needed some DynamoDB tables.

On a side note: DynamoDB needs some time to get used to. Deliberately select the right set of primary and sort key for your tables because you don’t want to waste time scanning through big tables. In our case we had tweet id’s but that’s not what we’re querying. We are querying for time and as our data is time sequenced, so we chose the day (yyyy-mm-dd) as our primary key and a timestamp as the sort-key. This way we can query for days and sort by timestamp or filter for a time frame of the day.

You can add resources like this.

Referencing these resources in other parts of the serverless.yml is quite handy too. For example to trigger a lambda function from a new input to a table we need the stream ARN in our event trigger.
With ‘Fn::GetAtt:[TrumpsTweetDynamoDBTable, StreamArn]’ the Attribute is automatically resolved when deploying.

If you’re ever in need of examples or documentation have a look at the following links, they were rather helpful in the learning process.


AWS CloudFormation allows you to create and manage AWS infrastructure deployments predictably and repeatedly…


Serverless Examples — A collection of boilerplates and examples of serverless architectures built with the Serverless…github.com


Now there is a catch with having our configuration files in source control —  secrets. We don’t want them in any repository. So we had to think of storing secrets such as api keys and passwords elsewhere. There are multiple ways to do this in a secure manner. We chose a method where we store the secrets in a separate yaml file (serverless.env.yml) which is then referenced in the actual serverless.yml. You can reference to other files with ‘${file(path/to/file)}:some.key’. This way we can gitignore the serverless.env.yml containing our secrets but keep the serverless.yml checked in without accidentally committing them to a repo.

serverless.yml
serverless.env.yml

This method seemed feasible for this rather uncritical POC but if you are planning a bigger projects read the following.

Continue reading