{"id":3788,"date":"2018-07-31T15:08:24","date_gmt":"2018-07-31T13:08:24","guid":{"rendered":"https:\/\/blog.mi.hdm-stuttgart.de\/?p=3788"},"modified":"2023-08-06T21:51:49","modified_gmt":"2023-08-06T19:51:49","slug":"building-a-fully-scalable-architecture-with-aws","status":"publish","type":"post","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/07\/31\/building-a-fully-scalable-architecture-with-aws\/","title":{"rendered":"Building a fully scalable architecture with\u00a0AWS"},"content":{"rendered":"<h3>What I learned in building the <a href=\"https:\/\/github.com\/timgrossmann\/stateOfVeganism\">StateOfVeganism<\/a> ?<\/h3>\n<figure id=\"attachment_3789\" aria-describedby=\"caption-attachment-3789\" style=\"width: 542px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3789\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/07\/31\/building-a-fully-scalable-architecture-with-aws\/sov_architecture_small\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small.png\" data-orig-size=\"1920,1080\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"sov_architecture_small\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-1024x576.png\" class=\"wp-image-3789\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-300x169.png\" alt=\"\" width=\"542\" height=\"305\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-300x169.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-768x432.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-1024x576.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small.png 1920w\" sizes=\"auto, (max-width: 542px) 100vw, 542px\" \/><\/a><figcaption id=\"caption-attachment-3789\" class=\"wp-caption-text\">Final setup for the finished project (created with Cloudcraft)<\/figcaption><\/figure>\n<p class=\"graf graf--p\">By now, we all know that <strong class=\"markup--strong markup--p-strong\">news and media shape our views<\/strong>on these discussed topics. Of course, this is different from person to person. Some might be influenced a little more than others, but there always is some opinion communicated.<\/p>\n<p class=\"graf graf--p\">Considering this, it would be really interesting to see the continuous development of mood communicated towards a specific topic or person in the media.<\/p>\n<p><!--more--><\/p>\n<p class=\"graf graf--p\">For me, <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/timgrossmann\/stateOfVeganism\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/timgrossmann\/stateOfVeganism\">Veganism<\/a> is an interesting topic, especially since it is frequently mentioned in the media. Since the medias opinion changes the opinion of people, it would be interesting to see what \u201csentiment\u201d they communicate.<\/p>\n<p><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3792\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/07\/31\/building-a-fully-scalable-architecture-with-aws\/state_of_veganism\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism.png\" data-orig-size=\"1886,680\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"state_of_veganism\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism-1024x369.png\" class=\" wp-image-3792 aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism-300x108.png\" alt=\"\" width=\"375\" height=\"135\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism-300x108.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism-768x277.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism-1024x369.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/state_of_veganism.png 1886w\" sizes=\"auto, (max-width: 375px) 100vw, 375px\" \/><\/a><\/p>\n<p class=\"graf graf--p\">This is what this whole project is about. Collecting news that talk about or mention <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/timgrossmann\/stateOfVeganism\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/timgrossmann\/stateOfVeganism\">Veganism<\/a>, finding out the context in which it is mentioned and analysing whether they propagate negativity or positivity.<br \/>\nOf course, a huge percentage of the analysed articles should be classified as \u201cNeutral\u201d if the writers do a good job in only communicating information, so we should keep that in mind, too.<\/p>\n<p class=\"graf graf--p\">Of course, this is an incredible opportunity to pick up new toolset, especially when thinking about <em class=\"markup--em markup--p-em\">the sheer number of articles published daily<\/em>.<br \/>\nSo, we could think about building a scalable architecture, a scalable architecture that is cheap\/free in the beginning when there is no traffic and only a few articles but scales easily and infinitely once the amount of mentions or traffic increases.<br \/>\nI can hear the cloud calling.<\/p>\n<h3 class=\"graf graf--h3\">Designing the Architecture<\/h3>\n<p class=\"graf graf--p\">Planning is everything, especially when we want to make sure that the architecture scales right from the beginning.<\/p>\n<blockquote class=\"graf graf--pullquote\"><p>Starting on paper is a good thing because it enables you to be extremely rough and quick in iterating.<\/p><\/blockquote>\n<p class=\"graf graf--p\">&nbsp;Your first draft will never be your final one, and if it is, you\u2019ve probably forgotten to question your decisions.<\/p>\n<figure id=\"attachment_3793\" aria-describedby=\"caption-attachment-3793\" style=\"width: 544px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/concept.jpg\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3793\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/07\/31\/building-a-fully-scalable-architecture-with-aws\/concept\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/concept.jpg\" data-orig-size=\"1024,768\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;1.8&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;iPhone X&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1531303225&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;4&quot;,&quot;iso&quot;:&quot;25&quot;,&quot;shutter_speed&quot;:&quot;0.033333333333333&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"concept\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/concept-1024x768.jpg\" class=\"wp-image-3793\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/concept-300x225.jpg\" alt=\"\" width=\"544\" height=\"408\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/concept-300x225.jpg 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/concept-768x576.jpg 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/concept.jpg 1024w\" sizes=\"auto, (max-width: 544px) 100vw, 544px\" \/><\/a><figcaption id=\"caption-attachment-3793\" class=\"wp-caption-text\">First Concept with removed components<\/figcaption><\/figure>\n<p class=\"graf graf--p\">For me, the process of coming up with a suitable and, even more important, reasonable architecture was the key thing I wanted to improve with this project. The different components seemed pretty \u201ceasy\u201d to implement and build but coming up with the right system, the right communication and a nice, clean data pipeline was the really interesting part.<\/p>\n<p class=\"graf graf--p\">In the beginning, I had some bottlenecks in my design which, at one point, would\u2019ve brought my whole system to its knees. In that situation, I thought about just adding more \u201cscalable\u201d services like queues to queue the load and take care of it.<br \/>\nWhen I finally had a design which, I guess, could handle a ton of load and was dynamically scalable, it was a mess\u2026 Too many services, a lot of overhead and an overall \u201cdirty\u201d structure.<\/p>\n<p class=\"graf graf--p\">When I looked at the architecture a few days later, I realised that there was so much I could optimise with a few changes. I started to remove all the queues and thought about replacing actual virtual machines with FAAS components.<br \/>\nAfter that session, I had a much cleaner and still scalable design.<\/p>\n<h4><\/h4>\n<h4 class=\"graf graf--h4\">Think of the structure and technologies, not implementations.<\/h4>\n<figure id=\"attachment_3789\" aria-describedby=\"caption-attachment-3789\" style=\"width: 582px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3789\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/07\/31\/building-a-fully-scalable-architecture-with-aws\/sov_architecture_small\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small.png\" data-orig-size=\"1920,1080\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"sov_architecture_small\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-1024x576.png\" class=\"wp-image-3789\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-300x169.png\" alt=\"\" width=\"582\" height=\"328\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-300x169.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-768x432.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small-1024x576.png 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/07\/sov_architecture_small.png 1920w\" sizes=\"auto, (max-width: 582px) 100vw, 582px\" \/><\/a><figcaption id=\"caption-attachment-3789\" class=\"wp-caption-text\">Final Architecture<\/figcaption><\/figure>\n<p class=\"graf graf--p\">That was one of the mistakes I made quite early in the project. I started out by looking at what services IBMs BlueMix can offer and went on from there. Which ones could I mix together and use in my design that seemed to work together with triggers and queues and whatever?<br \/>\nIn the end, I could remove a lot of the overhead in terms of services by simply <strong class=\"markup--strong markup--p-strong\">stepping away from it and thinking of the overall structure and technologies I need rather than the different implementations<\/strong>.<\/p>\n<section class=\"section section--body\">\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<p class=\"graf graf--p\"><strong class=\"markup--strong markup--p-strong\">Broken down into a few distinct steps<\/strong>, the project should:<\/p>\n<ul class=\"postList\">\n<li class=\"graf graf--li\">Every Hour (In the beginning, since there are only few articles at the moment -&gt; can be made every minute or even second).<\/li>\n<li class=\"graf graf--li\">Get the news from some <a class=\"markup--anchor markup--li-anchor\" href=\"http:\/\/newsapi.org\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/newsapi.org\/\">NewsAPI<\/a>and store them.<\/li>\n<li class=\"graf graf--li\">Process each article, analyse the sentiment of it and store it in a database to query.<\/li>\n<li class=\"graf graf--li\">Upon visiting the website, it get\u2019s the selected range data and displays bars\/articles.<\/li>\n<\/ul>\n<p class=\"graf graf--p\"><strong class=\"markup--strong markup--p-strong\">So, what I finally ended up with is a CloudWatch Trigger which triggers a Lambda Function every hour. This Function gets the news data for the last hour from the NewsAPI. It then saves each article as a separate JSON file into an S3 bucket.<br \/>\nThis bucket, upon ObjectPut, triggers another Lambda Function which loads the JSON from S3, creates a \u201ccontext\u201d for the appearance of the part-word \u201cvegan\u201d and sends the created context to the AWS Comprehend sentiment analysis. Once the function gets the sentiment information for the current article, it writes it to a DynamoDB table.<br \/>\nThis Table is the root for the data displayed in the frontend. It gives the user a few filters with which they can explore the data a little bit more.<\/strong><\/p>\n<blockquote class=\"graf graf--blockquote\"><p>If you\u2019re interested in a deeper explanation, jump down to the description of the separate components.<\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/section>\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<\/section>\n<div class=\"section-inner sectionLayout--insetColumn\">\n<h3 class=\"graf graf--h3\">Who\u2019s \u201cThe One\u201d Cloud Provider?<\/h3>\n<p class=\"graf graf--p\">Before I knew that I was going with AWS, I tried out two other cloud providers. It\u2019s a very basic and extremely subjective view on which provider to choose, but maybe it will help some other \u201cCloud-Beginners\u201d choose.<\/p>\n<p class=\"graf graf--p\">I started out with IBMs Bluemix Cloud, moved to GC and finally ended up using AWS. Here are some of the \u201creasons\u201d for my choice.<\/p>\n<style>.gist table { margin-bottom: 0; }<\/style>\n<div style=\"tab-size: 8\" id=\"gist90927935\" class=\"gist\">\n<div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n<div class=\"gist-data\">\n<div class=\"js-gist-file-update-container js-task-list-container\">\n<div id=\"file-basic_cloud_provider_points-md\" class=\"file my-2\">\n<div id=\"file-basic_cloud_provider_points-md-readme\" class=\"Box-body readme blob tmp-p-5 tmp-p-xl-6 \"\n    style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n    aria-label=\"basic_cloud_provider_points.md content, created by timgrossmann on 01:08PM on July 24, 2018.\"\n  ><\/p>\n<article class=\"markdown-body entry-content container-lg\" itemprop=\"text\"><markdown-accessiblity-table><\/p>\n<table>\n<thead>\n<tr>\n<th>IBM Bluemix<\/th>\n<th>Google Cloud<\/th>\n<th>Amazon Web Services<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Bad Documentation<\/td>\n<td>Little bit better documentation<\/td>\n<td>Still bad documentation but best of all<\/td>\n<\/tr>\n<tr>\n<td>Very few examples<\/td>\n<td>More examples<\/td>\n<td>Most examples<\/td>\n<\/tr>\n<tr>\n<td>Restricted to Student features<\/td>\n<td>$300 to start with<\/td>\n<td>Free tiers for almost every feature<\/td>\n<\/tr>\n<tr>\n<td>Very few already answered questions<\/td>\n<td>Several answered common questions<\/td>\n<td>A lot of answered common questions<\/td>\n<\/tr>\n<tr>\n<td>Lecturer from IBM<\/td>\n<td>Great AI\/NLU Components<\/td>\n<td>Great integration (SDK) for common languages<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><\/markdown-accessiblity-table><br \/>\n<\/article>\n<\/p><\/div>\n<\/p><\/div>\n<\/div><\/div>\n<div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/timgrossmann\/dcc3c6bf71f8d484a934d466420dadb1\/raw\/d8c510e3329babdf0eb0fb4fb6cf575886a0c2d4\/basic_cloud_provider_points.md\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a><br \/>\n        <a href=\"https:\/\/gist.github.com\/timgrossmann\/dcc3c6bf71f8d484a934d466420dadb1#file-basic_cloud_provider_points-md\" class=\"Link--inTextBlock\"><br \/>\n          basic_cloud_provider_points.md<br \/>\n        <\/a><br \/>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n<\/p><\/div>\n<\/div>\n<\/div>\n<div>\n<section class=\"section section--body\">\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<p class=\"graf graf--p\">A lot of the points listed here really only tell how good the overall documentation and community is, how many of the issues I encountered already existed and had answers on StackOverflow.<\/p>\n<h4 class=\"graf graf--h4\">Documentation and Communities are&nbsp;Key<\/h4>\n<p class=\"graf graf--p\">Especially for beginners and people who\u2019ve never worked with cloud technologies. The documentation and, even more important, the documented and explained examples were simply the best for AWS.<\/p>\n<p class=\"graf graf--p\">Of course, you don\u2019t have to settle for a single provider. In my case, I could\u2019ve easily used Google\u2019s NLU tools because, in my opinion, they brought the better results. I just wanted to keep my whole system on one platform, I can still change this later on if I want to.<\/p>\n<p class=\"graf graf--p\">The starter packs of all providers are actually really nice. You\u2019ll get $300 on GC which will enable you to do a lot of stuff. However, it\u2019s also kind of dangerous since you\u2019ll be charged if you should use up the amount and forget to turn off and destroy all the services building up the costs. BlueMix only has very limited access to services on their free tier which is a little bit unfortunate if you want to test out the full suite.<br \/>\nAmazon, for me, was the nicest one since they also have a free tier which will allow you to use nearly every feature (some only with the smallest instance like EC2.micro).<\/p>\n<p class=\"graf graf--p\">Like already mentioned, this is a very flat and subjective opinion on which one to go for\u2026 For me AWS was the easiest and fastest to pick up without investing too much time upfront.<\/p>\n<\/div>\n<\/div>\n<\/section>\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<h3 class=\"graf graf--h3\">The Components<\/h3>\n<p class=\"graf graf--p\">The whole project can basically be split into three main components that need work.<\/p>\n<p class=\"graf graf--p\">The <strong class=\"markup--strong markup--p-strong\">Article Collection,<\/strong>which consists of the hourly cron job, the lambda function which calls the NewsAPI and the S3 bucket that stores all the articles.<\/p>\n<p class=\"graf graf--p\">The <strong class=\"markup--strong markup--p-strong\">Data Enrichment<\/strong>part which loads the article from S3, creates the context and analyses it using Comprehend, and the DynamoDB that stores the enriched data for later use in the frontend.<\/p>\n<p class=\"graf graf--p\">And the <strong class=\"markup--strong markup--p-strong\">Frontend<\/strong>which gets displayed when the users request the webpage. This component consists of a graphical user interface, a scalable server service which serves the webpage and, again, the DynamoDB.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/15zlW79_Bp5JnyIFY-bj8vQ.png\" data-image-id=\"1*5zlW79_Bp5JnyIFY-bj8vQ.png\" data-width=\"128\" data-height=\"46\"><\/figure>\n<\/div>\n<h4 class=\"graf graf--h4\">Article Collection<\/h4>\n<figure class=\"graf graf--figure\">\n<p><figure style=\"width: 1600px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1n5S48v05Xr2ezB9Y4PQSDg.png\" alt=\"\" data-image-id=\"1*n5S48v05Xr2ezB9Y4PQSDg.png\" data-width=\"1948\" data-height=\"1467\" width=\"1600\" height=\"1204\"><figcaption class=\"wp-caption-text\">Article Collection Part<\/figcaption><\/figure><\/figure>\n<p class=\"graf graf--p\">The first and probably easiest part of the whole project was collecting all the articles and news that contain the keyword \u201cvegan\u201d. Luckily, there are a ton of APIs that provide such a service.<br \/>\nOne of them is <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/newsapi.org\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/newsapi.org\">NewsAPI.org<\/a>.<\/p>\n<p class=\"graf graf--p\">With their API, it\u2019s extremely easy and understandable. They have different endpoints. One of them is called \u201ceverything\u201d which, as the name suggests, just returns all the articles that contain a given keyword.<br \/>\nUsing Node.JS here, this looks something like this.<\/p>\n<figure class=\"graf graf--figure\">\n<p><figure style=\"width: 1188px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/11VDkST9u7M1fRcwUSjuU7g.png\" alt=\"\" data-image-id=\"1*1VDkST9u7M1fRcwUSjuU7g.png\" data-width=\"1188\" data-height=\"966\" width=\"1188\" height=\"966\"><figcaption class=\"wp-caption-text\">NewsAPI query for 1 hour of data from the beginning of the&nbsp;year<\/figcaption><\/figure><figcaption class=\"imageCaption\"><\/figcaption>The + sign in front of the query String \u201cvegan\u201d simply means that the word must appear.<br \/>\nThe pageSize defines how many articles per request will be returned.<br \/>\nYou definitely want to keep an eye on that. If, for example, your system has extremely limited memory, it makes sense to do more requests (use the provided cursor) in order to not crash the instance with too big responses.<\/figure>\n<p class=\"graf graf--p\">The response from NewsAPI.org looks like this. If you\u2019re interested in seeing more examples, head over to their <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/newsapi.org\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/newsapi.org\">website<\/a>where they have a lot of examples displayed.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1Nku1tVq-Lel-Xi4DtbbHiQ.png\" data-image-id=\"1*Nku1tVq-Lel-Xi4DtbbHiQ.png\" data-width=\"1032\" data-height=\"1254\"><\/figure>\n<p class=\"graf graf--p\">As you can see, those article records only give a very basic view of the article itself. Terms like vegan, which appear in some context inside the article without being the main topic of it, are not represented in the title or description.<br \/>\nTherefore, we need the Data Enrichment component which we\u2019ll cover a little bit later.<br \/>\nHowever, this is exactly the type of JSON data that is stored in the S3 bucket, ready for further processing.<\/p>\n<p class=\"graf graf--p\">Trying an API locally and actually using it in the cloud is really similar.<br \/>\nOf course, there are some catches where you don\u2019t want to paste your API key into the actual code but rather use environment variables, but that\u2019s about it.<\/p>\n<p class=\"graf graf--p\">AWS has a very neat GUI for their Lambda setup. It really helps you understand the structure of your component and visualises which services and elements are connected to it.<br \/>\nIn the case of the first component, we have the CloudWatch Hourly Trigger on the \u201cInput\u201d-side and the Logging with CloudWatch and the S3 Bucket as a storage system on the \u201cOutput\u201d-side.<\/p>\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\">\n<p><figure style=\"width: 2000px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/10ENCoQJyv1VcMWEnBrkBjA.png\" alt=\"\" data-image-id=\"1*0ENCoQJyv1VcMWEnBrkBjA.png\" data-width=\"3248\" data-height=\"1030\" width=\"2000\" height=\"634\"><figcaption class=\"wp-caption-text\">Lambda GUI on&nbsp;AWS<\/figcaption><\/figure><figcaption class=\"imageCaption\"><\/figcaption>So, after putting everything together, importing the Node.JS SDK for AWS and testing out the whole script locally, I finally deployed it as a Lamdba Function.<br \/>\nThe final script is actually pretty short and understandable.<\/figure>\n<style>.gist table { margin-bottom: 0; }<\/style>\n<div style=\"tab-size: 8\" id=\"gist90945947\" class=\"gist\">\n<div class=\"gist-file\" translate=\"no\" data-color-mode=\"light\" data-light-theme=\"light\">\n<div class=\"gist-data\">\n<div class=\"js-gist-file-update-container js-task-list-container\">\n<div id=\"file-sov_article_collection-js\" class=\"file my-2\">\n<div itemprop=\"text\"\n      class=\"Box-body p-0 blob-wrapper data type-javascript  \"\n      style=\"overflow: auto\" tabindex=\"0\" role=\"region\"\n      aria-label=\"sov_article_collection.js content, created by timgrossmann on 09:14AM on July 25, 2018.\"\n    ><\/p>\n<div class=\"js-check-hidden-unicode js-blob-code-container blob-code-content\">\n<p>  <template class=\"js-file-alert-template\"><\/p>\n<div data-view-component=\"true\" class=\"flash flash-warn flash-full d-flex flex-items-center\">\n  <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg><br \/>\n    <span><br \/>\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.<br \/>\n      <a class=\"Link--inTextBlock\" href=\"https:\/\/github.co\/hiddenchars\" target=\"_blank\">Learn more about bidirectional Unicode characters<\/a><br \/>\n    <\/span><\/p>\n<div data-view-component=\"true\" class=\"flash-action\">        <a href=\"{{ revealButtonHref }}\" data-view-component=\"true\" class=\"btn-sm btn\">    Show hidden characters<br \/>\n<\/a>\n<\/div>\n<\/div>\n<p><\/template><br \/>\n<template class=\"js-line-alert-template\"><br \/>\n  <span aria-label=\"This line has hidden Unicode characters\" data-view-component=\"true\" class=\"line-alert tooltipped tooltipped-e\"><br \/>\n    <svg aria-hidden=\"true\" height=\"16\" viewBox=\"0 0 16 16\" version=\"1.1\" width=\"16\" data-view-component=\"true\" class=\"octicon octicon-alert\">\n    <path d=\"M6.457 1.047c.659-1.234 2.427-1.234 3.086 0l6.082 11.378A1.75 1.75 0 0 1 14.082 15H1.918a1.75 1.75 0 0 1-1.543-2.575Zm1.763.707a.25.25 0 0 0-.44 0L1.698 13.132a.25.25 0 0 0 .22.368h12.164a.25.25 0 0 0 .22-.368Zm.53 3.996v2.5a.75.75 0 0 1-1.5 0v-2.5a.75.75 0 0 1 1.5 0ZM9 11a1 1 0 1 1-2 0 1 1 0 0 1 2 0Z\"><\/path>\n<\/svg><br \/>\n<\/span><\/template><\/p>\n<table data-hpc class=\"highlight tab-size js-file-line-container\" data-tab-size=\"4\" data-paste-markdown-skip data-tagsearch-path=\"sov_article_collection.js\">\n<tr>\n<td id=\"file-sov_article_collection-js-L1\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"1\"><\/td>\n<td id=\"file-sov_article_collection-js-LC1\" class=\"blob-code blob-code-inner js-file-line\">const NewsAPI = require(&#39;newsapi&#39;)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L2\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"2\"><\/td>\n<td id=\"file-sov_article_collection-js-LC2\" class=\"blob-code blob-code-inner js-file-line\">const moment = require(&#39;moment&#39;)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L3\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"3\"><\/td>\n<td id=\"file-sov_article_collection-js-LC3\" class=\"blob-code blob-code-inner js-file-line\">const AWS = require(&#39;aws-sdk&#39;)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L4\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"4\"><\/td>\n<td id=\"file-sov_article_collection-js-LC4\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L5\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"5\"><\/td>\n<td id=\"file-sov_article_collection-js-LC5\" class=\"blob-code blob-code-inner js-file-line\">exports.handler = async (event) =&gt; {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L6\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"6\"><\/td>\n<td id=\"file-sov_article_collection-js-LC6\" class=\"blob-code blob-code-inner js-file-line\">  \/\/ Right now we only need to query the API every hour because there<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L7\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"7\"><\/td>\n<td id=\"file-sov_article_collection-js-LC7\" class=\"blob-code blob-code-inner js-file-line\">  \/\/ are very few articles that contain the word veganism<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L8\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"8\"><\/td>\n<td id=\"file-sov_article_collection-js-LC8\" class=\"blob-code blob-code-inner js-file-line\">  const toTS = moment().format(&#39;YYYY-MM-DDTHH:mm:ss&#39;)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L9\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"9\"><\/td>\n<td id=\"file-sov_article_collection-js-LC9\" class=\"blob-code blob-code-inner js-file-line\">  const fromTS = moment(toTS).subtract(1, &#39;hour&#39;).format(&#39;YYYY-MM-DDTHH:mm:ss&#39;)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L10\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"10\"><\/td>\n<td id=\"file-sov_article_collection-js-LC10\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L11\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"11\"><\/td>\n<td id=\"file-sov_article_collection-js-LC11\" class=\"blob-code blob-code-inner js-file-line\">  const newsapi = new NewsAPI(process.env.API_KEY)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L12\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"12\"><\/td>\n<td id=\"file-sov_article_collection-js-LC12\" class=\"blob-code blob-code-inner js-file-line\">  const s3 = new AWS.S3()<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L13\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"13\"><\/td>\n<td id=\"file-sov_article_collection-js-LC13\" class=\"blob-code blob-code-inner js-file-line\">  const myBucket = process.env.S3_BUCKET<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L14\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"14\"><\/td>\n<td id=\"file-sov_article_collection-js-LC14\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L15\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"15\"><\/td>\n<td id=\"file-sov_article_collection-js-LC15\" class=\"blob-code blob-code-inner js-file-line\">  \/\/ Get the news from the given timeframe<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L16\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"16\"><\/td>\n<td id=\"file-sov_article_collection-js-LC16\" class=\"blob-code blob-code-inner js-file-line\">  return new Promise((resolve, reject) =&gt; {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L17\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"17\"><\/td>\n<td id=\"file-sov_article_collection-js-LC17\" class=\"blob-code blob-code-inner js-file-line\">    newsapi.v2.everything({<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L18\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"18\"><\/td>\n<td id=\"file-sov_article_collection-js-LC18\" class=\"blob-code blob-code-inner js-file-line\">      q: &#39;+vegan&#39;,<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L19\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"19\"><\/td>\n<td id=\"file-sov_article_collection-js-LC19\" class=\"blob-code blob-code-inner js-file-line\">      pageSize: 100,<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L20\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"20\"><\/td>\n<td id=\"file-sov_article_collection-js-LC20\" class=\"blob-code blob-code-inner js-file-line\">      from: fromTS,<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L21\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"21\"><\/td>\n<td id=\"file-sov_article_collection-js-LC21\" class=\"blob-code blob-code-inner js-file-line\">      to: toTS<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L22\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"22\"><\/td>\n<td id=\"file-sov_article_collection-js-LC22\" class=\"blob-code blob-code-inner js-file-line\">    })<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L23\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"23\"><\/td>\n<td id=\"file-sov_article_collection-js-LC23\" class=\"blob-code blob-code-inner js-file-line\">      .then(response =&gt; {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L24\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"24\"><\/td>\n<td id=\"file-sov_article_collection-js-LC24\" class=\"blob-code blob-code-inner js-file-line\">        console.log(`Working with a total of ${response.articles.length} articles.`)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L25\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"25\"><\/td>\n<td id=\"file-sov_article_collection-js-LC25\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L26\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"26\"><\/td>\n<td id=\"file-sov_article_collection-js-LC26\" class=\"blob-code blob-code-inner js-file-line\">        \/\/ Write all the documents to the S3-bucket<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L27\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"27\"><\/td>\n<td id=\"file-sov_article_collection-js-LC27\" class=\"blob-code blob-code-inner js-file-line\">        const promisedArticles = response.articles.map(article =&gt; {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L28\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"28\"><\/td>\n<td id=\"file-sov_article_collection-js-LC28\" class=\"blob-code blob-code-inner js-file-line\">          const myKey = `sov_${article.publishedAt}.json`<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L29\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"29\"><\/td>\n<td id=\"file-sov_article_collection-js-LC29\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L30\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"30\"><\/td>\n<td id=\"file-sov_article_collection-js-LC30\" class=\"blob-code blob-code-inner js-file-line\">          const params = {Bucket: myBucket, Key: myKey, Body: JSON.stringify(article, null, 2)}<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L31\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"31\"><\/td>\n<td id=\"file-sov_article_collection-js-LC31\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L32\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"32\"><\/td>\n<td id=\"file-sov_article_collection-js-LC32\" class=\"blob-code blob-code-inner js-file-line\">          \/\/ Saving the record for given key in S3<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L33\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"33\"><\/td>\n<td id=\"file-sov_article_collection-js-LC33\" class=\"blob-code blob-code-inner js-file-line\">          return new Promise((res, rej) =&gt; {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L34\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"34\"><\/td>\n<td id=\"file-sov_article_collection-js-LC34\" class=\"blob-code blob-code-inner js-file-line\">            s3.putObject(params, (err, data) =&gt; {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L35\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"35\"><\/td>\n<td id=\"file-sov_article_collection-js-LC35\" class=\"blob-code blob-code-inner js-file-line\">              if (err) {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L36\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"36\"><\/td>\n<td id=\"file-sov_article_collection-js-LC36\" class=\"blob-code blob-code-inner js-file-line\">                console.error(`Problem with persisting article to S3&#8230; ${err}`)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L37\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"37\"><\/td>\n<td id=\"file-sov_article_collection-js-LC37\" class=\"blob-code blob-code-inner js-file-line\">                rej(err)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L38\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"38\"><\/td>\n<td id=\"file-sov_article_collection-js-LC38\" class=\"blob-code blob-code-inner js-file-line\">                return<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L39\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"39\"><\/td>\n<td id=\"file-sov_article_collection-js-LC39\" class=\"blob-code blob-code-inner js-file-line\">              }<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L40\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"40\"><\/td>\n<td id=\"file-sov_article_collection-js-LC40\" class=\"blob-code blob-code-inner js-file-line\">\n<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L41\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"41\"><\/td>\n<td id=\"file-sov_article_collection-js-LC41\" class=\"blob-code blob-code-inner js-file-line\">              console.log(`Successfully uploaded data to ${myBucket}\/${myKey}`)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L42\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"42\"><\/td>\n<td id=\"file-sov_article_collection-js-LC42\" class=\"blob-code blob-code-inner js-file-line\">              res(`Successfully uploaded data to ${myBucket}\/${myKey}`)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L43\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"43\"><\/td>\n<td id=\"file-sov_article_collection-js-LC43\" class=\"blob-code blob-code-inner js-file-line\">            })<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L44\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"44\"><\/td>\n<td id=\"file-sov_article_collection-js-LC44\" class=\"blob-code blob-code-inner js-file-line\">          })<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L45\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"45\"><\/td>\n<td id=\"file-sov_article_collection-js-LC45\" class=\"blob-code blob-code-inner js-file-line\">        })<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L46\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"46\"><\/td>\n<td id=\"file-sov_article_collection-js-LC46\" class=\"blob-code blob-code-inner js-file-line\">    })<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L47\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"47\"><\/td>\n<td id=\"file-sov_article_collection-js-LC47\" class=\"blob-code blob-code-inner js-file-line\">      .catch(err =&gt; {<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L48\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"48\"><\/td>\n<td id=\"file-sov_article_collection-js-LC48\" class=\"blob-code blob-code-inner js-file-line\">        console.error(`Encountered a problem&#8230; ${err}`)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L49\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"49\"><\/td>\n<td id=\"file-sov_article_collection-js-LC49\" class=\"blob-code blob-code-inner js-file-line\">        reject(err)<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L50\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"50\"><\/td>\n<td id=\"file-sov_article_collection-js-LC50\" class=\"blob-code blob-code-inner js-file-line\">      })<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L51\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"51\"><\/td>\n<td id=\"file-sov_article_collection-js-LC51\" class=\"blob-code blob-code-inner js-file-line\">  })<\/td>\n<\/tr>\n<tr>\n<td id=\"file-sov_article_collection-js-L52\" class=\"blob-num js-line-number js-blob-rnum\" data-line-number=\"52\"><\/td>\n<td id=\"file-sov_article_collection-js-LC52\" class=\"blob-code blob-code-inner js-file-line\">}<\/td>\n<\/tr>\n<\/table>\n<\/div><\/div>\n<\/p><\/div>\n<\/div><\/div>\n<div class=\"gist-meta\">\n        <a href=\"https:\/\/gist.github.com\/timgrossmann\/15a6b1bda51ee8ebf2ef948221762c93\/raw\/65f151e5f79df0957aeb1f7348b1921755c465e4\/sov_article_collection.js\" style=\"float:right\" class=\"Link--inTextBlock\">view raw<\/a><br \/>\n        <a href=\"https:\/\/gist.github.com\/timgrossmann\/15a6b1bda51ee8ebf2ef948221762c93#file-sov_article_collection-js\" class=\"Link--inTextBlock\"><br \/>\n          sov_article_collection.js<br \/>\n        <\/a><br \/>\n        hosted with &#10084; by <a class=\"Link--inTextBlock\" href=\"https:\/\/github.com\">GitHub<\/a>\n      <\/div>\n<\/p><\/div>\n<\/div>\n<\/div>\n<\/section>\n<\/div>\n<div>\n<p class=\"graf graf--p\">The GUI has some nice testing features with which you can simply trigger your Function by hand.<br \/>\nNothing worked\u2026<\/p>\n<p class=\"graf graf--p\">After a few seconds of googling, I found the term \u201cPolicies\u201d. I\u2019ve heard of them before but never read up on them or tried to really understand them.<\/p>\n<p class=\"graf graf--p\">Basically, they describe what service\/user\/group is allowed to do what. This was the missing piece I had to allow my Lambda function to write something to S3.<br \/>\n(I won\u2019t go into detail about it here, but if you want to skip to policies, feel free to head to the end of the article.)<\/p>\n<p class=\"graf graf--p\">A policy in AWS is a simple JSON-Style configuration which, in the case of my article collection function, looked like this.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1npLW4w4HXBLauxSs4u2FTA.png\" data-image-id=\"1*npLW4w4HXBLauxSs4u2FTA.png\" data-width=\"1456\" data-height=\"1926\"><\/figure>\n<p class=\"graf graf--p\">This is the config that describes the previously mentioned \u201cOutput\u201d-Side of the function.<br \/>\nIn the statements, we can see that it gets access to different methods of the logging tools and S3.<\/p>\n<p class=\"graf graf--p\">The weird part about the assigned resource for the S3 bucket is that if not stated otherwise in the options of your S3 bucket, you have to both provide the root and \u201ceverything below\u201d as two separate resources.<\/p>\n<blockquote class=\"graf graf--blockquote\"><p>The example given above allows the Lambda Function to do anything with the S3 bucket, this is not how you should set up your system! Your components should only be allowed to do what they are designated to.<\/p><\/blockquote>\n<p class=\"graf graf--p\">Once this was entered, I could finally see the records getting put into my S3 bucket.<\/p>\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1clogAHGwONSIyv99B1NA9w.png\" data-image-id=\"1*clogAHGwONSIyv99B1NA9w.png\" data-width=\"2000\" data-height=\"376\"><\/figure>\n<\/div>\n<p>&nbsp;<\/p>\n<div>\n<blockquote class=\"graf graf--pullquote\"><p>Special Characters are&nbsp;evil\u2026<\/p><\/blockquote>\n<p class=\"graf graf--p\">When I tried to get the data back from the S3 bucket I encountered some problem. It just wouldn\u2019t give me the JSON file for the key that was created.<br \/>\nI had a hard time finding out what was wrong until at one point, I realised that, by default, AWS enables logging for your services.<br \/>\n<strong class=\"markup--strong markup--p-strong\">This was gold!<\/strong><\/p>\n<p class=\"graf graf--p\">When I looked into the logs, the problem jumped at me right away\u2026 It seemed like the key-value that gets sent by the S3-Trigger does some URL-Encoding.<br \/>\nHowever, this problem was absolutely invisible when just looking at the S3 key names where everything was displayed correctly.<\/p>\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/12xIG4cCs82orSaLIfkX9Rw.png\" data-image-id=\"1*2xIG4cCs82orSaLIfkX9Rw.png\" data-width=\"2396\" data-height=\"126\"><\/figure>\n<p class=\"graf graf--p\">The solution to this problem was pretty easy to solve. I just replaced every special character with a dash which won\u2019t be replaced by some encoded value.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1S7guV5ivMyrlBpjfF-ahLA.png\" data-image-id=\"1*S7guV5ivMyrlBpjfF-ahLA.png\" data-width=\"1472\" data-height=\"582\"><figcaption class=\"imageCaption\">Solution to the URLEncoded key&nbsp;problem<\/figcaption><\/figure>\n<h4 class=\"graf graf--h4\">So, always make sure to not risk putting some special characters in keys. It might save you a ton of debugging and&nbsp;effort.<\/h4>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/15zlW79_Bp5JnyIFY-bj8vQ.png\" data-image-id=\"1*5zlW79_Bp5JnyIFY-bj8vQ.png\" data-width=\"128\" data-height=\"46\"><\/figure>\n<\/div>\n<p>&nbsp;<\/p>\n<h4 class=\"graf graf--h4\">Data Enrichment<\/h4>\n<figure class=\"graf graf--figure\">\n<p><figure style=\"width: 1573px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1IlwXtAtgS5cea5aZ9yWS8w.png\" alt=\"\" data-image-id=\"1*IlwXtAtgS5cea5aZ9yWS8w.png\" data-width=\"1573\" data-height=\"1432\" width=\"1573\" height=\"1432\"><figcaption class=\"wp-caption-text\">Data Enrichment Part<\/figcaption><\/figure><figcaption class=\"imageCaption\">Since we now have all the articles as single records in our S3 bucket, we can think about our enrichment part. We have to combine some steps in order to fulfil our thought of pipeline which, just to think back, was the following.<\/figcaption><\/figure>\n<ul class=\"postList\">\n<li class=\"graf graf--li\">Get record from S3 bucket.<\/li>\n<li class=\"graf graf--li\">Build a context from the actual article in combination with the title and description.<\/li>\n<li class=\"graf graf--li\">Analyse the created context and enrich the record with the result.<\/li>\n<li class=\"graf graf--li\">Write the enriched article-record to our DynamoDB table.<\/li>\n<\/ul>\n<p class=\"graf graf--p\">One of the really awesome things about Promises in JavaScript is that you can model pipelines exactly the way you would describe them in text. If we compare the code with the explanation of what steps will be taken, we can see the similarity.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1vJrmbTm85PTdHOF1iWOp7A.png\" data-image-id=\"1*vJrmbTm85PTdHOF1iWOp7A.png\" data-width=\"1456\" data-height=\"774\"><\/figure>\n<p class=\"graf graf--p\">If you take a closer look at the first line of the code above, you can see the export handler.<br \/>\nThis line is always predefined in the Lambda Functions in order to know which method to call. This means that your own code belongs in the curly braces of the async block.<\/p>\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1SBHLxOtPtDpdGQYuxKZLYQ.png\" data-image-id=\"1*SBHLxOtPtDpdGQYuxKZLYQ.png\" data-width=\"3262\" data-height=\"1300\"><\/figure>\n<p class=\"graf graf--p\">For the Data Enrichment part, we need some more services. We want to be able to send and get data from Comprehends sentiment analysis, write our final record to DynamoDB and also have logging.<\/p>\n<p class=\"graf graf--p\">Have you noticed the S3 Service on the \u201cOutput\u201d-side? <strong class=\"markup--strong markup--p-strong\">This is why I always put the Output in quotes, even though we only want to read data here, it\u2019s displayed on the right hand side. I basically just list all the services our function interacts with.<\/strong><\/p>\n<p class=\"graf graf--p\">The policy looks comparable to the one of the article collection component. It just has some more resources and rules which define the relation between Lambda and the other services.<\/p>\n<p class=\"graf graf--p\">Even though Google Cloud, in my opinion, has the \u201cbetter\u201d NLU components,<strong class=\"markup--strong markup--p-strong\">I just love the simplicity and unified API of AWS\u2019 services.<\/strong>If you\u2019ve used one of them, you think you know them all. E.g. here\u2019s how to get a record from S3 and how the sentiment detection works in Node.JS.<\/p>\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1UBo_g5b4jZfu8trW0IViBA.png\" data-image-id=\"1*UBo_g5b4jZfu8trW0IViBA.png\" data-width=\"2622\" data-height=\"1062\"><\/figure>\n<p class=\"graf graf--p\">Probably one of the most interesting tasks of the Data Enrichment Component was the creation of the \u201ccontext\u201d of the word vegan in the article.<br \/>\nJust as a reminder, we need this context since a lot of articles only mention the word \u201cVegan\u201d without having \u201cVeganism\u201d as a topic. So, how do we extract parts from a text?<br \/>\nI went for Regular Expressions. They are incredibly nice to use, and you can use playgrounds like<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/regex101.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/regex101.com\">Regex101<\/a>to play around and find the right regex for your use case.<\/p>\n<p class=\"graf graf--p\">The challenge was to come up with a regex that could find sentences that contained the word \u201cvegan\u201d.<br \/>\nSomehow it turned out to be harder than expected to make it generalise for whole text passages that also had line breaks, etc. in it.<br \/>\nThe final regex looks like this.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1b_89-CiPEPHGp0OSvONsMQ.png\" data-image-id=\"1*b_89-CiPEPHGp0OSvONsMQ.png\" data-width=\"1436\" data-height=\"438\"><\/figure>\n<p class=\"graf graf--p\">The problem was that for long texts, this was not working due to timeout problems. The solution in this case was pretty \u201cstraightforward\u201d\u2026 I simply crawled the text and split it by line breaks which made it way easier to process for the RegEx module.<\/p>\n<p class=\"graf graf--p\">In the end, the whole <strong class=\"markup--strong markup--p-strong\">context \u201ccreation\u201d was a mixture of splitting the text, filtering for passages that contained the word vegan, extracting the matching sentence from that passage and joining it back together<\/strong>so that it could be used in the sentiment analysis.<br \/>\nAlso the title and description might play a role, so I also added those to the context if they contained the word \u201cvegan\u201d.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1QUpaU_yM3s_Wh35ZeuXseA.png\" data-image-id=\"1*QUpaU_yM3s_Wh35ZeuXseA.png\" data-width=\"1472\" data-height=\"1398\"><\/figure>\n<p class=\"graf graf--p\">Once all the code for the different steps was in place, I thought I could start building the frontend, but something wasn\u2019t right, some of the records just did not appear in my DynamoDB table\u2026<\/p>\n<p>&nbsp;<\/p>\n<blockquote class=\"graf graf--pullquote\"><p><strong class=\"markup--strong markup--pullquote-strong\">Empty Strings in DynamoDB are also&nbsp;evil\u2026<\/strong><\/p><\/blockquote>\n<p class=\"graf graf--p\">When checking back with the status of my already running system, I realised that some of the articles would not be converted to a DynamoDB table entry at all.<br \/>\nAfter checking out the logs, I found this Exception which absolutely confused me\u2026<\/p>\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1FlYJ6Ppo2K39VD7d1yW_eg.png\" data-image-id=\"1*FlYJ6Ppo2K39VD7d1yW_eg.png\" data-width=\"2610\" data-height=\"108\"><\/figure>\n<p class=\"graf graf--p\">To be honest, this is a really weird behaviour since, as stated in the <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/forums.aws.amazon.com\/thread.jspa?threadID=90137\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/forums.aws.amazon.com\/thread.jspa?threadID=90137\">discussion<\/a>, the semantics and usage of an empty String is absolutely different than that of a Null value\u2026<\/p>\n<p class=\"graf graf--p\">However, since I can\u2019t change anything about the design of the DynamoDB, I had to find a solution to avoid getting the empty String error.<br \/>\nIn my case, it was really easy. I just iterate through the whole JSON object and check whether there is an empty String or not, and if there is, I just replace the value with null. That\u2019s it, works like charm and does not cause any problems.<br \/>\n(I needed to check if it has a value in the frontend though since getting the length of a null value throws an error).<\/p>\n<figure class=\"graf graf--figure graf--startsWithDoubleQuote\">\n<p><figure style=\"width: 1400px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1dFMXmp_QV3mD0hLCpWU-bg.png\" alt=\"\" data-image-id=\"1*dFMXmp_QV3mD0hLCpWU-bg.png\" data-width=\"1400\" data-height=\"630\" width=\"1400\" height=\"630\"><figcaption class=\"wp-caption-text\">\u201cDirty\u201d Fix for the empty String&nbsp;problem<\/figcaption><\/figure><\/figure>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/15zlW79_Bp5JnyIFY-bj8vQ.png\" data-image-id=\"1*5zlW79_Bp5JnyIFY-bj8vQ.png\" data-width=\"128\" data-height=\"46\"><\/figure>\n<h4 class=\"graf graf--h4 graf--empty\"><\/h4>\n<p>&nbsp;<\/p>\n<section class=\"section section--body\">\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<h4 class=\"graf graf--h4\">Frontend<\/h4>\n<figure class=\"graf graf--figure\">\n<p><figure style=\"width: 1600px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1KKkHqw0BwU5wn6hWQ1JY8w.png\" alt=\"\" data-image-id=\"1*KKkHqw0BwU5wn6hWQ1JY8w.png\" data-width=\"2182\" data-height=\"1040\" width=\"1600\" height=\"762\"><figcaption class=\"wp-caption-text\">Frontend Part<\/figcaption><\/figure><figcaption class=\"imageCaption\">The last part was to actually create a frontend and deploy it so people could visit the page and see the <a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/stateofveganism.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/stateofveganism.com\">StateOfVeganism<\/a>.<\/figcaption><\/figure>\n<p class=\"graf graf--p\">Of course, I was thinking whether I should use one of those fancy frontend frameworks like Angular, React or Vue.js\u2026 But well, I went for absolutely old- school plain HTML, CSS and JS.<\/p>\n<p class=\"graf graf--p\"><strong class=\"markup--strong markup--p-strong\">The idea I had for the frontend was extremely minimalistic<\/strong>.<br \/>\nBasically it was just a bar that is divided into threesections.<br \/>\nPositive, Neutral and Negative.<br \/>\nWhen clicking on either one of those, it would display some titles and links to articles that were classified with this sentiment.<br \/>\nIn the end, that was exactly what it turned out to be. You can <a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/sovfrontend-env.qrg7cy6rmq.us-east-1.elasticbeanstalk.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/sovfrontend-env.qrg7cy6rmq.us-east-1.elasticbeanstalk.com\">check out the page here.<\/a><br \/>\nI thought about making it live at stateOfVeganism.com, but let\u2019s see\u2026<\/p>\n<\/div>\n<div class=\"section-inner sectionLayout--outsetColumn\">\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\">\n<p><figure style=\"width: 2000px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1GiLmRO1YMnLr3dL9OW8g3A.png\" alt=\"\" data-image-id=\"1*GiLmRO1YMnLr3dL9OW8g3A.png\" data-width=\"3840\" data-height=\"2230\" width=\"2000\" height=\"1161\"><figcaption class=\"wp-caption-text\">GUI of StateOfVegnsim<\/figcaption><\/figure><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<\/div>\n<div class=\"section-inner sectionLayout--insetColumn\">\n<blockquote class=\"graf graf--blockquote\"><p>Make sure to note the funny third article of the articles that have been classified as \u201cNegative\u201d&nbsp;\ud83d\ude09<\/p><\/blockquote>\n<p class=\"graf graf--p\">Deploying the frontend on one of AWS\u2019 services was something else I had to think about.<br \/>\nI definitely wanted to take a service that already incorporated elastic scaling, so I had to decide between Elastic Container Service or Elastic Beanstalk (actual EC2 instances).<\/p>\n<p class=\"graf graf--p\">In the end, I went for Beanstalk since I really liked the straightforward approach and the incredibly easy deployment. You can basically compare it to Heroku in the way you set it up.<br \/>\n(I had some problems with my auto scaling group not being allowed to deploy EC2 instances because I use the free tier on AWS, but after a few mails with the AWS support, everything worked right out of the box).<\/p>\n<p class=\"graf graf--p\">I just deployed a Node.js Express Server Application that serves my frontend on each path.<\/p>\n<figure class=\"graf graf--figure\"><img decoding=\"async\" class=\"graf-image aligncenter\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1kXVcV9_NmIu-CXIvMUO-qQ.png\" data-image-id=\"1*kXVcV9_NmIu-CXIvMUO-qQ.png\" data-width=\"1472\" data-height=\"822\"><\/figure>\n<p class=\"graf graf--p\">This setup, by default, provides the index.html which resides in the \u201cpublic\u201d folder which is exactly what I want.<\/p>\n<p class=\"graf graf--p\">Of course this is the most basic setup, and for most applications, it\u2019s not the recommended way since you somehow have to provide the credentials in order to access the DynamoDB table. It would be better to do some server-side rendering and store the credentials in environment variables so that nobody can access them.<\/p>\n<p>&nbsp;<\/p>\n<blockquote class=\"graf graf--pullquote\"><p>Playing it cool and deploying the AWS keys in the front&nbsp;end\u2026<\/p><\/blockquote>\n<p class=\"graf graf--p\">This is something you should never do. However, since I restricted the access of those credentials to only the scan method of the DynamoDB table, you can get the chance to dig deeper into my data if you\u2019re interested.<\/p>\n<p class=\"graf graf--p\">I also restricted the number of requests that can be done so that the credentials will \u201cstop working\u201d once the free monthly limit has been surpassed, just to make sure.<\/p>\n<p class=\"graf graf--p\">But feel free to look at the data and play around a little bit if you\u2019re interested. Just make sure to not overdo it since the API will stop providing the data to the frontend at one point.<\/p>\n<\/div>\n<\/div>\n<\/section>\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<\/section>\n<div class=\"section-inner sectionLayout--insetColumn\"><\/div>\n<div>\n<h3 class=\"graf graf--h3\">Policies, Policies?\u2026 Policies!<\/h3>\n<p class=\"graf graf--p\">When I started working with cloud technologies, I realised that there has to be a way to allow\/restrict access to the single components and create relations. This is where policies come into place. They also help you to do access management by giving you the tools you need to give specific users and groups permissions. At one point, you\u2019ll probably struggle with this topic so it makes sense to <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/docs.aws.amazon.com\/IAM\/latest\/UserGuide\/access_policies.html\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/docs.aws.amazon.com\/IAM\/latest\/UserGuide\/access_policies.html\">read up on it a little bit<\/a>.<\/p>\n<p class=\"graf graf--p\">There are basically two types of policies in AWS. Both are simple JSON style configuration files.<br \/>\nHowever, one of them is assigned to the resource itself, e.g. S3, and the other one gets assigned to roles, users or groups.<\/p>\n<p class=\"graf graf--p\">The table below shows some very rough statements about which policy you might want to choose for your task.<\/p>\n<section class=\"section section--body\">\n<div class=\"section-divider\">So, what is the actual difference?<\/div>\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<p class=\"graf graf--p\">This might become clearer when we compare an example of both policy types with each other.<\/p>\n<\/div>\n<div class=\"section-inner sectionLayout--outsetColumn\">\n<figure class=\"graf graf--figure graf--layoutOutsetCenter\">\n<p><figure style=\"width: 2000px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"graf-image\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/1fwRRc2V_eXU8mHyFk9n6Jw.png\" alt=\"\" data-image-id=\"1*fwRRc2V_eXU8mHyFk9n6Jw.png\" data-width=\"2324\" data-height=\"794\" width=\"2000\" height=\"683\"><figcaption class=\"wp-caption-text\">IAM-Policy and Resource&nbsp;Policy<\/figcaption><\/figure><\/figure>\n<\/div>\n<div class=\"section-inner sectionLayout--insetColumn\">\n<p class=\"graf graf--p\">The policy on the left is the IAM-Policy (or Identity-Based). The right one is the Resource-(Based)-Policy.<br \/>\nIf we start to compare them line by line we can\u2019t see any difference until we reach the first statement which defines some rules related to some service. In this case, it\u2019s S3.<\/p>\n<p class=\"graf graf--p\">In the Resource-Policy, we see an attribute that is called \u201cPrincipal\u201d which is missing in the IAM-Policy.<br \/>\nIn the context of a Resource-Policy, this describes the entities that are \u201cassigned\u201d to this rule. In the example given above, this would be the users, Alice and root.<\/p>\n<p class=\"graf graf--p\">On the other hand, to achieve the exact same result with IAM-Policies, we would have to assign the policy on the left, to our existing users, Alice and root.<\/p>\n<p class=\"graf graf--p\">Depending on your use case, it might make sense to use one or the other. It\u2019s also is a question of what your \u201cstyle\u201d or the convention or your workplace is.<\/p>\n<\/div>\n<\/div>\n<\/section>\n<section class=\"section section--body\">\n<div class=\"section-divider\">\n<hr class=\"section-divider\">\n<\/div>\n<\/section>\n<\/div>\n<div><\/div>\n<div class=\"section-inner sectionLayout--insetColumn\"><span style=\"font-size: 20px; font-weight: 600;\">What\u2019s next?<\/span><\/div>\n<div>\n<section class=\"section section--body\">\n<div class=\"section-content\">\n<div class=\"section-inner sectionLayout--insetColumn\">\n<p class=\"graf graf--p\"><a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/sovfrontend-env.qrg7cy6rmq.us-east-1.elasticbeanstalk.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/sovfrontend-env.qrg7cy6rmq.us-east-1.elasticbeanstalk.com\">StateOfVeganism<\/a>is live already.<br \/>\nHowever, this does not mean that there is nothing to improve anymore. One thing I definitely have to work on is, for example that recipes from Pinterest are not classified as \u201cPositive\u201d but rather \u201cNeutral\u201d.<br \/>\nThe basic functionality is working as expected. The data pipeline works nicely and if anything should go wrong, I will have nice logging with CloudWatch already enabled.<\/p>\n<p class=\"graf graf--p\">It\u2019s been great to really think through and build such a system. Especially questioning my decisions was very helpful in optimising the whole architecture.<\/p>\n<p class=\"graf graf--p\">The next time you\u2019re thinking about building a side project, think about building it with one of the cloud providers. It might be a bigger time investment in the beginning, but <strong class=\"markup--strong markup--p-strong\">learning how to use and build systems with an infrastructure like AWS really helps you to grow as a developer<\/strong>.<\/p>\n<\/div>\n<p class=\"graf graf--p\">Thank you for reading.<br \/>\nBe sure to follow me on <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.youtube.com\/channel\/UC9_Bk9247GgJ3k9O7yxctFg\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.youtube.com\/channel\/UC9_Bk9247GgJ3k9O7yxctFg\">YouTube<\/a>and to star <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/timgrossmann\/stateOfVeganism\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/timgrossmann\/stateOfVeganism\">StateOfVeganism on GitHub.<\/a><\/p>\n<p class=\"graf graf--p\">Don\u2019t forget to follow me on <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/twitter.com\/timigrossmann\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/twitter.com\/timigrossmann\">Twitter<\/a>, <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/timgrossmann\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/timgrossmann\">GitHub <\/a>and<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.youtube.com\/channel\/UC9_Bk9247GgJ3k9O7yxctFg\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.youtube.com\/channel\/UC9_Bk9247GgJ3k9O7yxctFg\">Youtube<\/a>.<\/p>\n<\/div>\n<\/section>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>What I learned in building the StateOfVeganism ? By now, we all know that news and media shape our viewson these discussed topics. Of course, this is different from person to person. Some might be influenced a little more than others, but there always is some opinion communicated. Considering this, it would be really interesting [&hellip;]<\/p>\n","protected":false},"author":879,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[120,650,22,21,223],"tags":[77,84,83,7,169],"ppma_author":[754],"class_list":["post-3788","post","type-post","status-publish","format-standard","hentry","category-cloud-technologies","category-scalable-systems","category-student-projects","category-system-architecture","category-ultra-large-scale-systems","tag-amazon-web-services","tag-aws","tag-aws-lambda","tag-cloud","tag-vegan"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":22151,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2022\/02\/22\/designing-and-implementing-a-scalable-web-application\/","url_meta":{"origin":3788,"position":0},"title":"Designing the framework for a scalable CI\/CD supported web application","author":"Danial Eshete","date":"22. February 2022","format":false,"excerpt":"Documentation of our approaches to the project, our experiences and finally the lessons we learned. The development team approaches the project with little knowledge of cloud services and infrastructure. Furthermore, no one has significant experience with containers and\/or containerized applications. However, the team is well experienced in web development and\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Design_Desktop_Logged_In-3-150x150.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Design_Desktop_Logged_In-3-150x150.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Design_Desktop_Logged_In-3-150x150.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Design_Desktop_Logged_In-3-150x150.jpg?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Design_Desktop_Logged_In-3-150x150.jpg?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2022\/02\/Design_Desktop_Logged_In-3-150x150.jpg?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":4164,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/08\/31\/tweets-by-donnie-building-a-serverless-sentiment-analysis-application-with-the-twitter-streaming-api-lambda-and-kinesis\/","url_meta":{"origin":3788,"position":1},"title":"Tweets by Donnie\u200a-\u200aBuilding a serverless sentiment analysis application with the twitter streaming API,  Lambda and Kinesis","author":"dr053","date":"31. August 2018","format":false,"excerpt":"tweets-by-donnie dashboard \u00a0 Thinking of Trumps tweets it's pretty obvious that they are controversial. Trying to gain insights of how controversial his tweets really are, we created tweets-by-donnie. \u201cIt\u2019s freezing and snowing in New York\u200a\u2014\u200awe need global warming!\u201d Donald J. Trump You decide if it\u2019s meant as a joke or\u2026","rel":"","context":"In &quot;Cloud Technologies&quot;","block_context":{"text":"Cloud Technologies","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/scalable-systems\/cloud-technologies\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4122,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/08\/27\/building-a-serverless-web-service-for-music-fingerprinting\/","url_meta":{"origin":3788,"position":2},"title":"Building a Serverless Web Service For Music Fingerprinting","author":"Alexis Luengas","date":"27. August 2018","format":false,"excerpt":"Building serverless architectures is hard. At least it was to me in my first attempt to design a loosely coupled system that should, in the long term, mean a good bye to my all-time aversion towards system maintenance. Music information retrieval is also hard. It is when you attempt to\u2026","rel":"","context":"In &quot;Cloud Technologies&quot;","block_context":{"text":"Cloud Technologies","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/scalable-systems\/cloud-technologies\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/08\/Architecture-Diagram-300x190.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":5635,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/05\/a-dive-into-serverless-on-the-basis-of-aws-lambda\/","url_meta":{"origin":3788,"position":3},"title":"A Dive into Serverless on the Basis of AWS Lambda","author":"Can Kattwinkel","date":"5. March 2019","format":false,"excerpt":"Hypes help to overlook the fact that tech is often reinventing the wheel, forcing developers to update applications and architecture accordingly in painful migrations. Besides Kubernetes one of those current hypes is Serverless computing. While everyone agrees that Serverless offers some advantages it also introduces many problems. The current trend\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/warm.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/warm.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/warm.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":27914,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2025\/08\/27\/how-to-develop-an-aws-hosted-discord-bot\/","url_meta":{"origin":3788,"position":4},"title":"How to develop an AWS hosted Discord Bot","author":"Lara Blersch","date":"27. August 2025","format":false,"excerpt":"Introduction This semester, our team set itself the goal of developing a game for a Discord bot. Taking inspiration from Hitster and Nobody's Perfect, we created Headliner.Over three rounds, players receive meta information about a newspaper article, such as what happened, who was involved, where it happened, and when. Based\u2026","rel":"","context":"In &quot;Cloud Technologies&quot;","block_context":{"text":"Cloud Technologies","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/scalable-systems\/cloud-technologies\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/08\/Architecture.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/08\/Architecture.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/08\/Architecture.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/08\/Architecture.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/08\/Architecture.png?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/08\/Architecture.png?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":27618,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2025\/02\/28\/crowdconnect-developing-a-scalable-live-chat-application-with-aws-cloud-services\/","url_meta":{"origin":3788,"position":5},"title":"CrowdConnect &#8211; Developing a Scalable Live Chat Application with AWS Cloud Services","author":"Jannik Scheider","date":"28. February 2025","format":false,"excerpt":"Imagine you're developing a live chat application in the cloud that needs to serve a growing number of users simultaneously and in real time across multiple chat rooms. Sounds like a challenge? It is. But with proven approaches and valuable insights from real-world experience, this task can be successfully and\u2026","rel":"","context":"In &quot;Cloud Technologies&quot;","block_context":{"text":"Cloud Technologies","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/scalable-systems\/cloud-technologies\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/02\/image-14.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/02\/image-14.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/02\/image-14.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]}],"jetpack_sharing_enabled":true,"authors":[{"term_id":754,"user_id":879,"is_guest":0,"slug":"tg069","display_name":"Tim Grossmann","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/1f9852b4698f208a0356eaed7ce80fbfb8fd271052bac3d667af2312767bf6ee?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/3788","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/users\/879"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/comments?post=3788"}],"version-history":[{"count":14,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/3788\/revisions"}],"predecessor-version":[{"id":25522,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/3788\/revisions\/25522"}],"wp:attachment":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/media?parent=3788"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/categories?post=3788"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/tags?post=3788"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/ppma_author?post=3788"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}