Author: bh051, cz022, ds168

Student Projects, System Designs, System Engineering
Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 6 – Apache Spark and/vs Apache Hadoop?
bh051, cz022, ds168
9. March 2017
At the beginning of this article series we introduced the core concepts of Hadoop and Spark in a nutshell. Both, Apache Spark and Apache Hadoop are frameworks for efficient processing of large data on computer clusters. The question arises how they differ or relate to each other. Hereof it seems the opinions are divided. In…
Read more…
Student Projects, System Designs, System Engineering
Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 5 – Spark applications in PIA project
bh051, cz022, ds168
9. March 2017
The main reason for choosing Spark was a second project which we developed for the course “Programming Intelligent Applications”. For this project we wanted to implement a framework which is able to monitor important events (e.g. terror, natural disasters) on the world through Twitter. To separate important tweets from others we use Latent Dirichlet Allocation…
Read more…
Student Projects, System Designs, System Engineering
Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 4 – Big Data Engineering
bh051, cz022, ds168
9. March 2017
Our objective in this project was to build an environment that could be practical. So we set up a virtual Hadoop test cluster with virtual machines. Our production environment was a Hadoop Cluster in the IBM Bluemix cloud which we could use for free with our student accounts. We developed and tested the logic of…
Read more…
Student Projects, System Designs, System Engineering
Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 3 – What is Apache Spark?
bh051, cz022, ds168
8. March 2017
Apache Spark is a framework for fast processing of large data on computer clusters. Spark applications can be written in Scala, Java, Python or R and can be executed in the cloud or on Hadoop (YARN) or Mesos cluster managers. It is also possible to run Spark applications standalone, that means locally on a computer.…
Read more…
Student Projects, System Designs, System Engineering
Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 2 – Apache Hadoop Ecosystem
bh051, cz022, ds168
8. March 2017
In our project we primarily implemented Spark applications, but we used components of Apache Hadoop like the Hadoop distributed file system or the cluster manager Hadoop YARN. For our discussion in the last part of this blog article it is moreover necessary to understand Hadoop MapReduce for comparison to Apache Spark. Because of this we…
Read more…
Student Projects, System Designs, System Engineering
Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 1 – Introduction
bh051, cz022, ds168
8. March 2017
As part of the lecture “System Engineering and Management” in the winter semester 2016/17, we run a project with Apache Spark and the Apache Hadoop Ecosystem.
Read more…

Author: bh051, cz022, ds168

Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 6 – Apache Spark and/vs Apache Hadoop?

Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 5 – Spark applications in PIA project

Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 4 – Big Data Engineering

Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 3 – What is Apache Spark?

Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 2 – Apache Hadoop Ecosystem

Of Apache Spark, Hadoop, Vagrant, VirtualBox and IBM Bluemix Services – Part 1 – Introduction