{"id":3114,"date":"2017-09-01T16:51:56","date_gmt":"2017-09-01T14:51:56","guid":{"rendered":"https:\/\/blog.mi.hdm-stuttgart.de\/?p=3114"},"modified":"2023-06-08T17:36:07","modified_gmt":"2023-06-08T15:36:07","slug":"sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics","status":"publish","type":"post","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/01\/sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics\/","title":{"rendered":"Sport data stream processing on IBM Bluemix:  Real Time Stream Processing Basics"},"content":{"rendered":"<p>New data is created every second. Just on Google the humans preform 40,000 search queries every second. By 2020 Forbes estimate 1.7 megabytes of new information will be created every second for every human on our planet.<br \/>\nHowever, it is about collecting and exchanging data, which then can be used in many different ways. Equipment fault monitoring, predictive maintenance, or real-time diagnostics are only a few of the possible scenarios. Dealing with all this information, creates certain challenges for stream processing of huge amounts of data is among them.<\/p>\n<p>Improvement of technology and development of big scaling systems like IBM Bluemix it is now not only possible process business or IoT data, it is also interesting to analyze complex and large data like sport studies. That\u2019s the main idea of my application &#8211; collect data from a 24-hour swimming event to use real time processed metrics to control event and athletes flow.<\/p>\n<p>In this article explains how to integrate and use the IBM tools for stream processing. We explore IBM Message Hub (for collecting streams), the IBM Streaming Analytics service (for processing events) and IBM Node.JS Service (for visualization data).<br \/>\n<!--more--><\/p>\n<h2>Scenario<\/h2>\n<p>In the swim sport, there is a competition called \u201c24-hour swimming\u201d. The goal is to swim the larges distance within 24 hours. Don\u00b4t worry! It is allowed to leave the pool whenever you like and take as many breaks as you want. In an earlier project, we developed a server\/app combination to count the laps of each swimmer electronically. You don\u2019t need to have people sitting around the pool any longer and counting by hand with pencil and paper. But there is still a problem. Each swimmer can choose the lane he wants to swim on by his own. Most people consider to be faster than they really are. So why don\u2019t process the data, each tap of the counting app produces to calculate averages of swimming time for each lane.<\/p>\n<p>Below is the scheme of a stream processing flow that we will implement in this post.<\/p>\n<p><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_1.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3117\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/01\/sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics\/real-time-stream-processing-basics_1\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_1.png\" data-orig-size=\"686,386\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Real Time Stream Processing Basics_1\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_1.png\" class=\"alignnone wp-image-3117 size-full\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_1.png\" alt=\"\" width=\"686\" height=\"386\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_1.png 686w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_1-300x169.png 300w\" sizes=\"auto, (max-width: 686px) 100vw, 686px\" \/><\/a><\/p>\n<p>Event Producer or in production the app sends messages, which then go to Message Hub. IBM Streaming Analytics Service receive them up from Message Hub, process and send calculated metrics to the Node.JS App which visualize the data. In further development Streaming Analytics store the data into an Cloudant storage on which the node app can make some lookups.<\/p>\n<h2>IBM Message Hub<\/h2>\n<p>The IBM Message Hub is a fully managed, cloud-based messaging service. It is built on the open source Big Data tool Apache Kafka and is available through IBM Bluemix\u00ae Platform. Each message which is send to the Kafka Cluster has got a topic. This topic allows us to send various messages within our Kafka environment and make some addressing. This is especially useful to set up a micro service environment. In our case we just need one topic because all our data, that the apps are sending, should be processed.<\/p>\n<p>The reason why we need to use a service like the Message Hub is caused by the policy rules for IBM Bluemix services. The Streaming Analytics service provides a function to receive directly messages but these messages have to be sent within the IBM Bluemix system. That\u2019s the reason why we take this detour through Message Hub.<\/p>\n<p>Because Message Hub is based on Apache Kafka all client libraries for communication are full compatible, e.g. for Android Apps there is an <a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/KAFKA\/Clients#Clients-AlternativeJava\">Java Library<\/a> &nbsp;or some for <a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/KAFKA\/Clients#Clients-Node.js\">Node.js<\/a> .<br \/>\nTo set up and configure IBM Message Hub check out <a href=\"https:\/\/console.bluemix.net\/docs\/services\/MessageHub\/index.html\">this<\/a> sample.<\/p>\n<h2>IBM Streaming Analytics<\/h2>\n<p>Streaming Analytics is a full managed service, which allows us to build streaming applications with ease. The developer doesn\u2019t have to worry about managing and configuring the infrastructure like an apache Spark Service, which is also offered by the IBM Bluemix infrastructure. But because that service comes out of the box you have first to configure it. That is a big advantage of Streaming Analytics. Developer can focus on building business logic and analytics. The service supports real-time analytics with extremely low latency and high performance. Either your application supports a single device and data source or connects and monitors hundreds of thousands of devices, Streaming Analytics performs seamlessly and reliably.<\/p>\n<p>The usage of the Streaming Analytics Service is really simple, either interactively through the Streaming Analytics Console<a href=\"#_ftn1\" name=\"_ftnref1\">[1]<\/a> or programmatically through the Streaming Analytics REST API.<\/p>\n<p><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_2.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3118\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/01\/sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics\/real-time-stream-processing-basics_2\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_2.png\" data-orig-size=\"943,520\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Real Time Stream Processing Basics_2\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_2.png\" class=\"alignnone wp-image-3118 size-full\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_2.png\" alt=\"\" width=\"943\" height=\"520\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_2.png 943w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_2-300x165.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_2-768x423.png 768w\" sizes=\"auto, (max-width: 943px) 100vw, 943px\" \/><\/a><\/p>\n<p>Through this service you can add a Streaming Analytics application e.g. an instance of that application running in the IBM Bluemix cloud. Scale the instances, check errors or get a visualization of the data flow graph.<\/p>\n<h2>IBM Streaming Analytics Application<\/h2>\n<p>When submitting a job to the IBM Streaming Analytic Service, you are prompted to identify a Streams Application Bundle (.sab) to upload and submit. The Streaming Analytics service in Bluemix requires to develop your Streams application in another Streams environment, outside of Bluemix.<\/p>\n<p>IBM Streaming Analytics is beside the usage within the Bluemix a stand-alone product for usage on your own hardware to set up a Streams environment. If you don\u2019t already have a Streams environment where you can develop and test applications, you can develop locally using the Quick Start Edition. The virtual machine provides a preconfigured Streams environment with development tools.<\/p>\n<p>Developing these types of applications is easy and can be done in multiple ways. Streaming Analytics supports a Java Application API, which means any Java developer can crank up an application with extreme ease.&nbsp;Same for Python developers.<\/p>\n<p>But the easiest way is to develop applications with the IBM\u00ae Streams Processing Language (SPL). It is a java near syntactic language for describing data streams with a lot of build in operators.<\/p>\n<p><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_3.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3121\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/01\/sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics\/real-time-stream-processing-basics_3\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_3.png\" data-orig-size=\"625,344\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Real Time Stream Processing Basics_3\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_3.png\" class=\"alignnone size-full wp-image-3121\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_3.png\" alt=\"\" width=\"625\" height=\"344\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_3.png 625w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_3-300x165.png 300w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>To write a Streams application, you first need to understand the basic building blocks. A single block is an operator. A block consists of input ports and output ports. A input port consumes a stream of continuous records. An operator can have one or more input ports. Through the output port an operator processes the records and create a new Stream. An operator can have one or more output ports.<\/p>\n<p>A Streaming Application consists of a flow graph of operators. Each block in that graph takes one small task like prepare, filter or aggregate on the records. A record is called a tuple. Each stream has a defined data structure e.g. a defined structure of a tuple and a stream consists only of one type of tuples.<\/p>\n<p>The Info Sphere Streams Studio supports an interactive or programmatically way building your SPL applications. Below we see our application for processing the swimming data.<\/p>\n<p><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3122\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/01\/sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics\/real-time-stream-processing-basics_4\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4.png\" data-orig-size=\"945,416\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Real Time Stream Processing Basics_4\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4.png\" class=\"alignnone size-full wp-image-3122\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4.png\" alt=\"\" width=\"945\" height=\"416\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4.png 945w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-300x132.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-768x338.png 768w\" sizes=\"auto, (max-width: 945px) 100vw, 945px\" \/><\/a>First we are reading our data from a Kafka Consumer e.g. read from Message Hub. After that I prepared the tuples to the right format. So one touple consist out of an ID, the name of the swimmer, the lane he is swimming on, an actual timestamp and the time remaining from last count. In Code it looks like this:<\/p>\n<p><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-1.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3123\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/01\/sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics\/real-time-stream-processing-basics_4-2\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-1.png\" data-orig-size=\"945,416\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Real Time Stream Processing Basics_4\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-1.png\" class=\"alignnone size-full wp-image-3123\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-1.png\" alt=\"\" width=\"945\" height=\"416\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-1.png 945w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-1-300x132.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_4-1-768x338.png 768w\" sizes=\"auto, (max-width: 945px) 100vw, 945px\" \/><\/a><\/p>\n<p>In top we see the schema of our touples. The Composite is a wrapper for one graph which is declared here with graph. We see beginning with \u201cstream\u201d our first block reading from Kafka\/Message Hub. The second block is the converting block convert the message from Kafka to our schema.<\/p>\n<p>After these preparation I did some filtering to remove wrong tuples like if the time between two tuple is under 21 seconds, which is the world record for 50m swimming. The second tuple will be removed from stream because something can\u00b4t be right. Same procedure for duplicates within the stream.<\/p>\n<p>After that our stream splits up into one block counting the total amount of meters, one for the total time average per lane and beyond another filtering the calculating of the averages per each lane. At the end the calculated metrics be converted back to a JSON string and send via HTTP to our Node.js Application.<\/p>\n<p>While writing Streaming Application it is highly recommended to regard design pattern for processing the data to keep your application performant as possible. The following diagram shows a good approach for developing such stream applications.<\/p>\n<p><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_6.png\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"3124\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/01\/sport-data-stream-processing-on-ibm-bluemix-real-time-stream-processing-basics\/real-time-stream-processing-basics_6\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_6.png\" data-orig-size=\"526,284\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Real Time Stream Processing Basics_6\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_6.png\" class=\"alignnone size-full wp-image-3124\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_6.png\" alt=\"\" width=\"526\" height=\"284\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_6.png 526w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/Real-Time-Stream-Processing-Basics_6-300x162.png 300w\" sizes=\"auto, (max-width: 526px) 100vw, 526px\" \/><\/a><\/p>\n<h2>Node.JS SDK for Bluemix<\/h2>\n<p>The Node.js app is used to visualize the data we calculated, based on a simple Node.js environment for Bluemix. The App also controls our Streaming Analytics Application through the Rest Interface of Streaming Analytics for Bluemix. So we did not have to upload the Streaming Application manually. The Streaming Analytics Service is connected to the Node.js App so we can share credentials and also send the data from the Analytics Application via HTTP. For easy creating charts I used Char.js.<\/p>\n<h2>Final Thought<\/h2>\n<p>Streaming Analytics is an fascinating part of the possibility systems like Microsoft Azure Cloud or IBM Bluemix offers. Not only the crazy idea of ruling that mass of data makes that fascination. It is more the ability to proceed them in real time, making description.<\/p>\n<p>Creating this little project showed me that the technology is available for everyone but that are only a few out there using it. There are so many use cases like real time suggestions while online shopping, better insights into exchange market or faster processing of medical test like an DNA analysis. But I realized also that it is not that easy to find metrics which are of interest and in my opinion that distinguish a Big Data analyst from a good one. He knows which metrics are interesting for the client, which could be interesting or which new key metrics he can calculate.<\/p>\n<p>All together it was interesting getting in touch with that technology and I hope I could share some information with you getting in touch with Streaming Analytics.<\/p>\n<p>Here are some links which helped me getting started about SPL (<a href=\"https:\/\/www.ibm.com\/support\/knowledgecenter\/SSCRJU_4.2.0\/com.ibm.streams.dev.doc\/doc\/dev-container.html)\">https:\/\/www.ibm.com\/support\/knowledgecenter\/SSCRJU_4.2.0\/com.ibm.streams.dev.doc\/doc\/dev-container.html)<\/a> or creating applications (<a href=\"https:\/\/developer.ibm.com\/streamsdev\/docs\/bluemix-streaming-analytics-starter-application\/\">https:\/\/developer.ibm.com\/streamsdev\/docs\/bluemix-streaming-analytics-starter-application\/<\/a>).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>New data is created every second. Just on Google the humans preform 40,000 search queries every second. By 2020 Forbes estimate 1.7 megabytes of new information will be created every second for every human on our planet. However, it is about collecting and exchanging data, which then can be used in many different ways. Equipment [&hellip;]<\/p>\n","protected":false},"author":487,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1,120,650,22],"tags":[],"ppma_author":[736],"class_list":["post-3114","post","type-post","status-publish","format-standard","hentry","category-allgemein","category-cloud-technologies","category-scalable-systems","category-student-projects"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":3029,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/08\/31\/wettersave-realizing-weather-forecasts-with-machine-learning\/","url_meta":{"origin":3114,"position":0},"title":"Wettersave \u2013 Realizing Weather Forecasts with Machine Learning","author":"sh245@hdm-stuttgart.de","date":"31. August 2017","format":false,"excerpt":"\u00a0\u00a0\u00a0\u00a0Introduction Since the internet boom a few years ago companies started to collect and save data in an almost aggressive way. But the huge amounts of data are actually useless if they are not used to gain new information with a higher value. I was always impressed by the way\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/08\/filterUndTyp-300x135.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":2153,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/03\/08\/of-apache-spark-hadoop-vagrant-virtualbox-and-ibm-bluemix-services-part-3-what-is-apache-spark\/","url_meta":{"origin":3114,"position":1},"title":"Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services &#8211; Part 3 &#8211; What is Apache Spark?","author":"bh051, cz022, ds168","date":"8. March 2017","format":false,"excerpt":"Apache Spark is a framework for fast processing of large data on computer clusters. Spark applications can be written in Scala, Java, Python or R and can be executed in the cloud or on Hadoop (YARN) or Mesos cluster managers. It is also possible to run Spark applications standalone, that\u2026","rel":"","context":"In &quot;Student Projects&quot;","block_context":{"text":"Student Projects","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/student-projects\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/03\/spark-overview-768x195.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/03\/spark-overview-768x195.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/03\/spark-overview-768x195.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":2360,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/06\/11\/analyzing-text-with-ibm-watson-services-on-bluemix\/","url_meta":{"origin":3114,"position":2},"title":"Analyzing text with IBM Watson services on Bluemix","author":"Patrick Kleindienst","date":"11. June 2017","format":false,"excerpt":"You might have already heard of IBM's artificial intelligence \"Watson\", which beat two former champions of the american television game show \"Jeopardy!\" back in 2011. What you probably don't know is that today lots of predefined Watson services are publicy available on IBM's cloud platform \"Bluemix\". These services cover different\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/06\/postman-watson-result.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/06\/postman-watson-result.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/06\/postman-watson-result.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/06\/postman-watson-result.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":2165,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/03\/09\/of-apache-spark-hadoop-vagrant-virtualbox-and-ibm-bluemix-services-part-6-apache-spark-andvs-apache-hadoop\/","url_meta":{"origin":3114,"position":3},"title":"Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services &#8211; Part 6 &#8211; Apache Spark and\/vs Apache Hadoop?","author":"bh051, cz022, ds168","date":"9. March 2017","format":false,"excerpt":"At the beginning of this article series we introduced the core concepts of Hadoop and Spark in a nutshell. Both, Apache Spark and Apache Hadoop are frameworks for efficient processing of large data on computer clusters. The question arises how they differ or relate to each other. Hereof it seems\u2026","rel":"","context":"In &quot;Student Projects&quot;","block_context":{"text":"Student Projects","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/student-projects\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2157,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/03\/09\/of-apache-spark-hadoop-vagrant-virtualbox-and-ibm-bluemix-services-part-4-big-data-engineering\/","url_meta":{"origin":3114,"position":4},"title":"Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services &#8211; Part 4 &#8211; Big Data Engineering","author":"bh051, cz022, ds168","date":"9. March 2017","format":false,"excerpt":"Our objective in this project was to build an environment that could be practical. So we set up a virtual Hadoop test cluster with virtual machines. Our production environment was a Hadoop Cluster in the IBM Bluemix cloud which we could use for free with our student accounts. We developed\u2026","rel":"","context":"In &quot;Student Projects&quot;","block_context":{"text":"Student Projects","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/student-projects\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/03\/dev-env-spark-768x512.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/03\/dev-env-spark-768x512.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/03\/dev-env-spark-768x512.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":10318,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2020\/04\/13\/open-source-batch-and-stream-processing-realtime-analysis-of-big-data\/","url_meta":{"origin":3114,"position":5},"title":"Open Source Batch and Stream Processing: Realtime Analysis of Big Data","author":"Marcel Stolin","date":"13. April 2020","format":false,"excerpt":"Abstract Since the beginning of Big Data, batch processing was the most popular choice for processing large amounts of generated data. These existing processing technologies are not suitable to process the large amount of data we face today. Research works developed a variety of technologies that focus on stream processing.\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/mapreduce.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/mapreduce.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/mapreduce.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]}],"jetpack_sharing_enabled":true,"authors":[{"term_id":736,"user_id":487,"is_guest":0,"slug":"nk065","display_name":"nk065@hdm-stuttgart.de","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/09a3ba891a81b89db95ead38fc98caf2efbf625fe43e5e8e0b532dbcedfe4dc2?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/3114","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/users\/487"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/comments?post=3114"}],"version-history":[{"count":6,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/3114\/revisions"}],"predecessor-version":[{"id":24734,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/3114\/revisions\/24734"}],"wp:attachment":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/media?parent=3114"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/categories?post=3114"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/tags?post=3114"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/ppma_author?post=3114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}