Building a Document Translator for a Multi-Language Blog

Motivation

Multi-Language Blog Navigation

The idea for this project occurred to me while I was listening to my sister share her vision for her recently started blog: To create a platform where writers of different ethnicity can publish texts in their native languages and exchange their stories with people from all over the world. Conquering the language barrier and making the texts available at least in the three most prominent languages – German, English and Arabic – requires the involvement of translators who are fluent in at least two of the demanded languages. Anyone who has ever attempted to make a translated text sound natural knows that this is no easy feat and can take up to many hours of finding the perfect balance between literal translation and understandable text.

This is where I saw room for improvement. Nowadays, machine translation tools have reached a decent level of fluency, despite not being able to capture the intricacies of different linguistic styles. Combining them with people who have a basic understanding of the source language can help speed up the process and reduce the effort considerably. Continue reading

Using the power of google cloud API: A dockerized node app counting words in prasentations.

For the Dev4Cloud lecture at HdM Stuttgart, we created a simple Go/NodeJS/React App, which helps people to keep track of often used words during presentations. In a presentation setting, most people tend to use too many fill words and to train against this, we want to introduce our presentation counter to you.

The presentation counter consists of 3 parts, the React frontend, the GO backend and the NodeJS speech server for the communication with Google Cloud platform. To make it short, the frontend captures the microphone audio, sends it to the speech server, and the speech server gets the audio transcript from Google. Then the transcript is send back to the Go Backend which saves the relevant words in an Alpine db and updates the frontend.

Frontend

Static compiled react frontend, contain code to communicate via Websocket with the Go and NodeJS server. As well as capturing the microphone audio. Capturing audio needs a bit of boilerplate code:

AudioContext = window.AudioContext || window.webkitAudioContext;
    context = new AudioContext();
    processor = context.createScriptProcessor(bufferSize, 1, 1);
    processor.connect(context.destination);
    context.resume();


    var handleSuccess = function (stream) {
        recButton.classList.add("rec-effect")

        globalStream = stream;
        input = context.createMediaStreamSource(stream);
        input.connect(processor);

        processor.onaudioprocess = function (e) {
            microphoneProcess(e);
        };
    };

    navigator.mediaDevices.getUserMedia(constraints)
        .then(handleSuccess);

The micophoneProcess inside the handleSuccess function receives a stream of an audio buffer with a given buffersize. The micophoneProcess does two important things, first it converts the stream from 48000 Hz to 16000 Hz and then uses WebSockets to send it to the NodeJS server close to real time.

Our frontend. Mobile first witch a super simple UI. Tap the mic to start the session and add words on the bottom.

Speech Server

This server is a lightweight NodeJS app. For using the Google Cloud API you need to have your own user-key and this should not be shared with the frontend, so we created a layer in between them. This server holds in key with the dotenv-library. Google Cloud API needs the key to be in the process environment variables, and adding the key on operating system level would be a big pain, so we used this great library. After an audio stream started, the server uses the key to authorize with Google Cloud API, and creates uses speechClient.streamingRecognize(request) to open a stream to the cloud. Inside the request parameter is the configuration and encoding information. Our configuration looks like this:

const request = {
    config: {
        encoding: encoding,
        sampleRateHertz: sampleRateHertz,
        languageCode: languageCode,
        profanityFilter: false,
        enableWordTimeOffsets: true
    },
    interimResults: true
};

Please note the last item: interimResults: true This tells the Google Cloud API to send unfinished results. Why would we want unfinished results, could you ask yourself? The answer is simple: we do not want to wait until the sentence is finished. As soon as a sentence is finished, Google can calculate the probability more accurate, since the context is closed. That means, when ever a Google detects a finished sentence, we can a more accurate prediction. But we would have to wait until the sentence is finished. Because of that we are using the less accurate results to get faster results, near real time again and might have to correct the displayed results if they change.

Google Cloud API gives a transcript of all the spoken words it can recognize in the audio stream. So the NodeJS Server sends this transcript, divided in a final and a interim part to the Go Backend where the words are going to be counted – and as soon as the interim part gets final recounted. In that way we have fast results, which is important for a word counter app – nobody would like to wait until the sentence is finished for the counts to update.

Google Cloud API

For this project we used the Google Speech API. To be able to use this, you first have to a Google API key, but this is quite straightforward and described here. Next you have to download a library from Google, which mirrors the API functions in your code. With NodeJS the installation is npm install google-cloud/speech --save. Now you can import it and initialise it.

const speech = require('@google-cloud/speech');
const speechClient = new speech.SpeechClient();

After those steps you are ready to go! Just follow the steps described in the Speech server section.

Another thing to add, with your Google accounts comes 300$ of free credit for the Google Cloud Platform. At first we wondered if this will be enough to develop the project, until we found out that you also have 60 free minutes to analyse with the Speech API. Even after those 60 Minutes it is only 15 ct/min. What we want to say, just play with it, its easy and the possibility to create a nice prototype is really awesome. Thanks Google!

Go-Backend

At the beginning, this project was fork of a project we did in the same semester. In the base project people can count words of a person doing a presentation by clicking on a button. For this former project I decided to try language in the backend which was new to me: GO. This language is more low-level than Java or JavaScript, but more high-level than C or C++. So if you come from a Java/JavaScript like we did, your mind will struggle probably from time to time. There some kind Pointer like in C but fortunately no pointer arithmetic. The main advantage of this language shall be, that it can be almost as fast as C++ but takes way less time to compile. Additionally it has some pure functional constructs in it.

So long story short: What we wanted to do is to extend this server that it uses the google speech-to-text to get back text form of someone’s presentation in real time. During the development process we found out that the requirements didn’t really meet with the previous ones. The server had to update the frontend in real time like it did before but now we wanted the server also to correct the frontends data. Why correcting? From the google API you get back a text in real time if you do stream processing. More or less word for word the sentence you get back is extended. But this sentence isn’t fix, as well as the words in it as long as the sentence hasn’t finished. This is because google always tries to return a reasonable sentence.

These facts and we still needed to count the words on every update made us to almost rewrite the whole web server code.

Docker

Both of us Simon and me Marius we still don’t know what we think about this tool. The general idea of having container deencapsulation purpose is great but working with docker often isn’t.

There is no real debug tool, at least we haven’t found one. Then it caches that much that you have to rebuild your container almost every time  from scratch disabling the cache. This can cause a very long build-step depending on what you have to install during the startup process.

But now to the advantages and what we did with it. Both of our servers, the node and the GO one are running its own alpine container. Docker-Compose is managing the they are in a docker network with a DNS, so the can communicate with their respective names.

This make it possible to deploy the application anywhere a docker daemon is available. What is quite nice. The whole software infrastructure comes with docker and the definitions in the Dockerfiles.

Summary

Working with cloud service is easier than we thought. Using the technology offered by google was very straight forward, but the docker and the GO thing wasn’t.

This was our first project using the cloud. As a result we just took a simple functionality it offers and created something it. Regarding the effort from an engineering point of view, I might have been easier to use more out of the box functionalities. For example you could have deployed the server code directly to a go or node service a google. There you can create containers by simply drag and drop modules to it.

Finally we can say, that it is worth to look deeper into the cloud topic. Not only because you needn’t to buy hardware, but also because it can save you a lot time to develop an application.

 

Written by: Simon Deussen & Marius Hahn

 

Building a fully scalable architecture with AWS

What I learned in building the StateOfVeganism 🌱

Final setup for the finished project (created with Cloudcraft)

By now, we all know that news and media shape our viewson these discussed topics. Of course, this is different from person to person. Some might be influenced a little more than others, but there always is some opinion communicated.

Considering this, it would be really interesting to see the continuous development of mood communicated towards a specific topic or person in the media.

For me, Veganism is an interesting topic, especially since it is frequently mentioned in the media. Since the medias opinion changes the opinion of people, it would be interesting to see what “sentiment” they communicate.

This is what this whole project is about. Collecting news that talk about or mention Veganism, finding out the context in which it is mentioned and analysing whether they propagate negativity or positivity.
Of course, a huge percentage of the analysed articles should be classified as “Neutral” if the writers do a good job in only communicating information, so we should keep that in mind, too.

Of course, this is an incredible opportunity to pick up new toolset, especially when thinking about the sheer number of articles published daily.
So, we could think about building a scalable architecture, a scalable architecture that is cheap/free in the beginning when there is no traffic and only a few articles but scales easily and infinitely once the amount of mentions or traffic increases.
I can hear the cloud calling.

Designing the Architecture

Planning is everything, especially when we want to make sure that the architecture scales right from the beginning.

Starting on paper is a good thing because it enables you to be extremely rough and quick in iterating.

 Your first draft will never be your final one, and if it is, you’ve probably forgotten to question your decisions.

First Concept with removed components

For me, the process of coming up with a suitable and, even more important, reasonable architecture was the key thing I wanted to improve with this project. The different components seemed pretty “easy” to implement and build but coming up with the right system, the right communication and a nice, clean data pipeline was the really interesting part.

In the beginning, I had some bottlenecks in my design which, at one point, would’ve brought my whole system to its knees. In that situation, I thought about just adding more “scalable” services like queues to queue the load and take care of it.
When I finally had a design which, I guess, could handle a ton of load and was dynamically scalable, it was a mess… Too many services, a lot of overhead and an overall “dirty” structure.

When I looked at the architecture a few days later, I realised that there was so much I could optimise with a few changes. I started to remove all the queues and thought about replacing actual virtual machines with FAAS components.
After that session, I had a much cleaner and still scalable design.

Think of the structure and technologies, not implementations.

Final Architecture

That was one of the mistakes I made quite early in the project. I started out by looking at what services IBMs BlueMix can offer and went on from there. Which ones could I mix together and use in my design that seemed to work together with triggers and queues and whatever?
In the end, I could remove a lot of the overhead in terms of services by simply stepping away from it and thinking of the overall structure and technologies I need rather than the different implementations.

Broken down into a few distinct steps, the project should:

  • Every Hour (In the beginning, since there are only few articles at the moment -> can be made every minute or even second).
  • Get the news from some NewsAPIand store them.
  • Process each article, analyse the sentiment of it and store it in a database to query.
  • Upon visiting the website, it get’s the selected range data and displays bars/articles.

So, what I finally ended up with is a CloudWatch Trigger which triggers a Lambda Function every hour. This Function gets the news data for the last hour from the NewsAPI. It then saves each article as a separate JSON file into an S3 bucket.
This bucket, upon ObjectPut, triggers another Lambda Function which loads the JSON from S3, creates a “context” for the appearance of the part-word “vegan” and sends the created context to the AWS Comprehend sentiment analysis. Once the function gets the sentiment information for the current article, it writes it to a DynamoDB table.
This Table is the root for the data displayed in the frontend. It gives the user a few filters with which they can explore the data a little bit more.

If you’re interested in a deeper explanation, jump down to the description of the separate components.


Who’s “The One” Cloud Provider?

Before I knew that I was going with AWS, I tried out two other cloud providers. It’s a very basic and extremely subjective view on which provider to choose, but maybe it will help some other “Cloud-Beginners” choose.

I started out with IBMs Bluemix Cloud, moved to GC and finally ended up using AWS. Here are some of the “reasons” for my choice.

A lot of the points listed here really only tell how good the overall documentation and community is, how many of the issues I encountered already existed and had answers on StackOverflow.

Documentation and Communities are Key

Especially for beginners and people who’ve never worked with cloud technologies. The documentation and, even more important, the documented and explained examples were simply the best for AWS.

Of course, you don’t have to settle for a single provider. In my case, I could’ve easily used Google’s NLU tools because, in my opinion, they brought the better results. I just wanted to keep my whole system on one platform, I can still change this later on if I want to.

The starter packs of all providers are actually really nice. You’ll get $300 on GC which will enable you to do a lot of stuff. However, it’s also kind of dangerous since you’ll be charged if you should use up the amount and forget to turn off and destroy all the services building up the costs. BlueMix only has very limited access to services on their free tier which is a little bit unfortunate if you want to test out the full suite.
Amazon, for me, was the nicest one since they also have a free tier which will allow you to use nearly every feature (some only with the smallest instance like EC2.micro).

Like already mentioned, this is a very flat and subjective opinion on which one to go for… For me AWS was the easiest and fastest to pick up without investing too much time upfront.


The Components

The whole project can basically be split into three main components that need work.

The Article Collection,which consists of the hourly cron job, the lambda function which calls the NewsAPI and the S3 bucket that stores all the articles.

The Data Enrichmentpart which loads the article from S3, creates the context and analyses it using Comprehend, and the DynamoDB that stores the enriched data for later use in the frontend.

And the Frontendwhich gets displayed when the users request the webpage. This component consists of a graphical user interface, a scalable server service which serves the webpage and, again, the DynamoDB.

Article Collection

Article Collection Part

The first and probably easiest part of the whole project was collecting all the articles and news that contain the keyword “vegan”. Luckily, there are a ton of APIs that provide such a service.
One of them is NewsAPI.org.

With their API, it’s extremely easy and understandable. They have different endpoints. One of them is called “everything” which, as the name suggests, just returns all the articles that contain a given keyword.
Using Node.JS here, this looks something like this.

NewsAPI query for 1 hour of data from the beginning of the year
The + sign in front of the query String “vegan” simply means that the word must appear.
The pageSize defines how many articles per request will be returned.
You definitely want to keep an eye on that. If, for example, your system has extremely limited memory, it makes sense to do more requests (use the provided cursor) in order to not crash the instance with too big responses.

The response from NewsAPI.org looks like this. If you’re interested in seeing more examples, head over to their websitewhere they have a lot of examples displayed.

As you can see, those article records only give a very basic view of the article itself. Terms like vegan, which appear in some context inside the article without being the main topic of it, are not represented in the title or description.
Therefore, we need the Data Enrichment component which we’ll cover a little bit later.
However, this is exactly the type of JSON data that is stored in the S3 bucket, ready for further processing.

Trying an API locally and actually using it in the cloud is really similar.
Of course, there are some catches where you don’t want to paste your API key into the actual code but rather use environment variables, but that’s about it.

AWS has a very neat GUI for their Lambda setup. It really helps you understand the structure of your component and visualises which services and elements are connected to it.
In the case of the first component, we have the CloudWatch Hourly Trigger on the “Input”-side and the Logging with CloudWatch and the S3 Bucket as a storage system on the “Output”-side.

Lambda GUI on AWS
So, after putting everything together, importing the Node.JS SDK for AWS and testing out the whole script locally, I finally deployed it as a Lamdba Function.
The final script is actually pretty short and understandable.

The GUI has some nice testing features with which you can simply trigger your Function by hand.
Nothing worked…

After a few seconds of googling, I found the term “Policies”. I’ve heard of them before but never read up on them or tried to really understand them.

Basically, they describe what service/user/group is allowed to do what. This was the missing piece I had to allow my Lambda function to write something to S3.
(I won’t go into detail about it here, but if you want to skip to policies, feel free to head to the end of the article.)

A policy in AWS is a simple JSON-Style configuration which, in the case of my article collection function, looked like this.

This is the config that describes the previously mentioned “Output”-Side of the function.
In the statements, we can see that it gets access to different methods of the logging tools and S3.

The weird part about the assigned resource for the S3 bucket is that if not stated otherwise in the options of your S3 bucket, you have to both provide the root and “everything below” as two separate resources.

The example given above allows the Lambda Function to do anything with the S3 bucket, this is not how you should set up your system! Your components should only be allowed to do what they are designated to.

Once this was entered, I could finally see the records getting put into my S3 bucket.

 

Special Characters are evil…

When I tried to get the data back from the S3 bucket I encountered some problem. It just wouldn’t give me the JSON file for the key that was created.
I had a hard time finding out what was wrong until at one point, I realised that, by default, AWS enables logging for your services.
This was gold!

When I looked into the logs, the problem jumped at me right away… It seemed like the key-value that gets sent by the S3-Trigger does some URL-Encoding.
However, this problem was absolutely invisible when just looking at the S3 key names where everything was displayed correctly.

The solution to this problem was pretty easy to solve. I just replaced every special character with a dash which won’t be replaced by some encoded value.

Solution to the URLEncoded key problem

So, always make sure to not risk putting some special characters in keys. It might save you a ton of debugging and effort.

 

Data Enrichment

Data Enrichment Part
Since we now have all the articles as single records in our S3 bucket, we can think about our enrichment part. We have to combine some steps in order to fulfil our thought of pipeline which, just to think back, was the following.
  • Get record from S3 bucket.
  • Build a context from the actual article in combination with the title and description.
  • Analyse the created context and enrich the record with the result.
  • Write the enriched article-record to our DynamoDB table.

One of the really awesome things about Promises in JavaScript is that you can model pipelines exactly the way you would describe them in text. If we compare the code with the explanation of what steps will be taken, we can see the similarity.

If you take a closer look at the first line of the code above, you can see the export handler.
This line is always predefined in the Lambda Functions in order to know which method to call. This means that your own code belongs in the curly braces of the async block.

For the Data Enrichment part, we need some more services. We want to be able to send and get data from Comprehends sentiment analysis, write our final record to DynamoDB and also have logging.

Have you noticed the S3 Service on the “Output”-side? This is why I always put the Output in quotes, even though we only want to read data here, it’s displayed on the right hand side. I basically just list all the services our function interacts with.

The policy looks comparable to the one of the article collection component. It just has some more resources and rules which define the relation between Lambda and the other services.

Even though Google Cloud, in my opinion, has the “better” NLU components,I just love the simplicity and unified API of AWS’ services.If you’ve used one of them, you think you know them all. E.g. here’s how to get a record from S3 and how the sentiment detection works in Node.JS.

Probably one of the most interesting tasks of the Data Enrichment Component was the creation of the “context” of the word vegan in the article.
Just as a reminder, we need this context since a lot of articles only mention the word “Vegan” without having “Veganism” as a topic. So, how do we extract parts from a text?
I went for Regular Expressions. They are incredibly nice to use, and you can use playgrounds likeRegex101to play around and find the right regex for your use case.

The challenge was to come up with a regex that could find sentences that contained the word “vegan”.
Somehow it turned out to be harder than expected to make it generalise for whole text passages that also had line breaks, etc. in it.
The final regex looks like this.

The problem was that for long texts, this was not working due to timeout problems. The solution in this case was pretty “straightforward”… I simply crawled the text and split it by line breaks which made it way easier to process for the RegEx module.

In the end, the whole context “creation” was a mixture of splitting the text, filtering for passages that contained the word vegan, extracting the matching sentence from that passage and joining it back togetherso that it could be used in the sentiment analysis.
Also the title and description might play a role, so I also added those to the context if they contained the word “vegan”.

Once all the code for the different steps was in place, I thought I could start building the frontend, but something wasn’t right, some of the records just did not appear in my DynamoDB table…

 

Empty Strings in DynamoDB are also evil…

When checking back with the status of my already running system, I realised that some of the articles would not be converted to a DynamoDB table entry at all.
After checking out the logs, I found this Exception which absolutely confused me…

To be honest, this is a really weird behaviour since, as stated in the discussion, the semantics and usage of an empty String is absolutely different than that of a Null value…

However, since I can’t change anything about the design of the DynamoDB, I had to find a solution to avoid getting the empty String error.
In my case, it was really easy. I just iterate through the whole JSON object and check whether there is an empty String or not, and if there is, I just replace the value with null. That’s it, works like charm and does not cause any problems.
(I needed to check if it has a value in the frontend though since getting the length of a null value throws an error).

“Dirty” Fix for the empty String problem

 

Frontend

Frontend Part
The last part was to actually create a frontend and deploy it so people could visit the page and see the StateOfVeganism.

Of course, I was thinking whether I should use one of those fancy frontend frameworks like Angular, React or Vue.js… But well, I went for absolutely old- school plain HTML, CSS and JS.

The idea I had for the frontend was extremely minimalistic.
Basically it was just a bar that is divided into threesections.
Positive, Neutral and Negative.
When clicking on either one of those, it would display some titles and links to articles that were classified with this sentiment.
In the end, that was exactly what it turned out to be. You can check out the page here.
I thought about making it live at stateOfVeganism.com, but let’s see…

GUI of StateOfVegnsim

Make sure to note the funny third article of the articles that have been classified as “Negative” 😉

Deploying the frontend on one of AWS’ services was something else I had to think about.
I definitely wanted to take a service that already incorporated elastic scaling, so I had to decide between Elastic Container Service or Elastic Beanstalk (actual EC2 instances).

In the end, I went for Beanstalk since I really liked the straightforward approach and the incredibly easy deployment. You can basically compare it to Heroku in the way you set it up.
(I had some problems with my auto scaling group not being allowed to deploy EC2 instances because I use the free tier on AWS, but after a few mails with the AWS support, everything worked right out of the box).

I just deployed a Node.js Express Server Application that serves my frontend on each path.

This setup, by default, provides the index.html which resides in the “public” folder which is exactly what I want.

Of course this is the most basic setup, and for most applications, it’s not the recommended way since you somehow have to provide the credentials in order to access the DynamoDB table. It would be better to do some server-side rendering and store the credentials in environment variables so that nobody can access them.

 

Playing it cool and deploying the AWS keys in the front end…

This is something you should never do. However, since I restricted the access of those credentials to only the scan method of the DynamoDB table, you can get the chance to dig deeper into my data if you’re interested.

I also restricted the number of requests that can be done so that the credentials will “stop working” once the free monthly limit has been surpassed, just to make sure.

But feel free to look at the data and play around a little bit if you’re interested. Just make sure to not overdo it since the API will stop providing the data to the frontend at one point.


Policies, Policies?… Policies!

When I started working with cloud technologies, I realised that there has to be a way to allow/restrict access to the single components and create relations. This is where policies come into place. They also help you to do access management by giving you the tools you need to give specific users and groups permissions. At one point, you’ll probably struggle with this topic so it makes sense to read up on it a little bit.

There are basically two types of policies in AWS. Both are simple JSON style configuration files.
However, one of them is assigned to the resource itself, e.g. S3, and the other one gets assigned to roles, users or groups.

The table below shows some very rough statements about which policy you might want to choose for your task.

So, what is the actual difference?

This might become clearer when we compare an example of both policy types with each other.

IAM-Policy and Resource Policy

The policy on the left is the IAM-Policy (or Identity-Based). The right one is the Resource-(Based)-Policy.
If we start to compare them line by line we can’t see any difference until we reach the first statement which defines some rules related to some service. In this case, it’s S3.

In the Resource-Policy, we see an attribute that is called “Principal” which is missing in the IAM-Policy.
In the context of a Resource-Policy, this describes the entities that are “assigned” to this rule. In the example given above, this would be the users, Alice and root.

On the other hand, to achieve the exact same result with IAM-Policies, we would have to assign the policy on the left, to our existing users, Alice and root.

Depending on your use case, it might make sense to use one or the other. It’s also is a question of what your “style” or the convention or your workplace is.


What’s next?

StateOfVeganismis live already.
However, this does not mean that there is nothing to improve anymore. One thing I definitely have to work on is, for example that recipes from Pinterest are not classified as “Positive” but rather “Neutral”.
The basic functionality is working as expected. The data pipeline works nicely and if anything should go wrong, I will have nice logging with CloudWatch already enabled.

It’s been great to really think through and build such a system. Especially questioning my decisions was very helpful in optimising the whole architecture.

The next time you’re thinking about building a side project, think about building it with one of the cloud providers. It might be a bigger time investment in the beginning, but learning how to use and build systems with an infrastructure like AWS really helps you to grow as a developer.

Thank you for reading.
Be sure to follow me on YouTubeand to star StateOfVeganism on GitHub.

Don’t forget to follow me on Twitter, GitHub andYoutube.

AIRA Voice Assistant – A proof of Concept in virtual reality

Motivation

As part of the lecture “Software Development for Cloud Computing” we were looking for a solution, how a user can get basic assistance within our existing virtual reality game AIRA. The primary objective was a maximum of user-friendliness, while avoiding an interruption of the immersive gaming experience. It is also important to keep in mind, that the user is on its own and any kind of support from outside is usually not possible.

Moreover, considering that within virtual reality applications generally no conventional input devices will be available and therefore a keyboard is not an option. If we still following up this idea, many people may think next of an on-screen keyboard, as they know it from their smart TV at home, which might be operated by a game controller. Although such an approach would be contrary to a high ease of use and majority of implementations are quite crippled as well as hard to use.

So, what would be obvious and take all previous considerations into account? Simply think of something that each of us is carrying along at any time – the own unique voice. According to this we decided to implement a personal voice assistant into our game. In the following, it can be seen that the individuality of each human voice leads into a lot of difficulties we have to take care of.

In the following, it will be explained in detail how we implemented a personal voice assistant using multiple Watson serviceswhich are part of the IBM Bluemix cloud platform. Especially fundamental problems we run into will be discussed and then possible approaches will be pointed out.

Continue reading

VVS-Delay – AI in the Cloud

Introduction

Howdy, Geeks! Ever frustrated by public transportation around Stuttgart?
Managed to get up early just to find out your train to university or work is delayed… again?
Yeah, we all know that! We wondered if we could get around this issue by connecting our alarm clock to some algorithms. So we would never ever have to get up too early again.

Well, okay, we’re not quite there yet. But we started with getting some data and did some hardly trustworthy hypothesis of prediction on it. In the end it’s up to you if you gonna believe it or not.

To give you a short overview, here are the components that are involved in the process. You will find the components described in more details below.
Process overview

A view parts in short:
1. crawler and database – get and store departure information
2. visualization – visualizes the delays on a map
3. statistical analysis – some statistical analysis on the delays over a week
4. continuous delivery – keep the production system up to date with the code

Continue reading

Sport data stream processing on IBM Bluemix: Real Time Stream Processing Basics

New data is created every second. Just on Google the humans preform 40,000 search queries every second. By 2020 Forbes estimate 1.7 megabytes of new information will be created every second for every human on our planet.
However, it is about collecting and exchanging data, which then can be used in many different ways. Equipment fault monitoring, predictive maintenance, or real-time diagnostics are only a few of the possible scenarios. Dealing with all this information, creates certain challenges for stream processing of huge amounts of data is among them.

Improvement of technology and development of big scaling systems like IBM Bluemix it is now not only possible process business or IoT data, it is also interesting to analyze complex and large data like sport studies. That’s the main idea of my application – collect data from a 24-hour swimming event to use real time processed metrics to control event and athletes flow.

In this article explains how to integrate and use the IBM tools for stream processing. We explore IBM Message Hub (for collecting streams), the IBM Streaming Analytics service (for processing events) and IBM Node.JS Service (for visualization data).
Continue reading

Developing a Chat Server and Client in the Cloud

Introduction

During the Lecture “Software Development for Cloud Computing” I decided to develop a Cloud based Chat Application with the help of IBM’s Bluemix.
The Application consists of 3 separate Applications:

  • Chat Server: Allows Clients to connect to it, manages the Chat-Channels/Users and relays messages sent from a client to the other clients in the same channel.
  • Chat Client: The Client consists of a GUI where the User can connect to the Server and chat with other Users.
  • Chat Backend Database: A simple Database which records and provides the chat history of a given Chat-Channel via REST.

Continue reading

IoT with the Raspberry Pi – Final application – Part 3

In our final application, we have put together a solution consisting of four different modules. First, we have again the Raspberry Pi which raises and sends the sensor data using the already presented Python script. We changed the transfer protocol in the final application to MQTT, which gives us more possibilities in different aspects, but more on that later.
Continue reading

IoT with the Raspberry Pi – Node RED – Part 2

As already stated in the introduction to our project, we decided to create a Cloud Foundry-Application in IBM Bluemix. We used the boilerplate called “Internet of Things Platform Starter”. Using this boilerplate Node Red is deployed initially.

Node Red is a software tool for graphical dataflow programming. It was developed by IBM and is open source since 2016. Providing a browser-based flow editor it enables to wire together hardware devices, APIs and online services.
Continue reading