{"id":6036,"date":"2019-03-10T18:39:17","date_gmt":"2019-03-10T17:39:17","guid":{"rendered":"https:\/\/blog.mi.hdm-stuttgart.de\/?p=6036"},"modified":"2023-06-18T18:28:29","modified_gmt":"2023-06-18T16:28:29","slug":"writing-high-performance-code-on-modern-hardware","status":"publish","type":"post","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/","title":{"rendered":"Writing High Performance Code on Modern Hardware"},"content":{"rendered":"\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2109\" height=\"888\" data-attachment-id=\"6038\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/titelbild-4\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/TitelBild-1.png\" data-orig-size=\"2109,888\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"TitelBild\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/TitelBild-1-1024x431.png\" src=\"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/TitelBild-1.png?fit=656%2C276&amp;ssl=1\" alt=\"\" class=\"wp-image-6038\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/TitelBild-1.png 2109w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/TitelBild-1-300x126.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/TitelBild-1-768x323.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/TitelBild-1-1024x431.png 1024w\" sizes=\"auto, (max-width: 2109px) 100vw, 2109px\" \/><\/figure>\n\n\n\n<p style=\"text-align:left\">Today, with the use of modern hardware combined with optimized high performant code, it is an easy task to process more than 500 million images per day on a single machine. Small improvements in the underlying implementations can have extreme large impacts on the execution time and are therefore fundamentally important to handle the huge amount of data in modern large scale systems. Furthermore, we can dramatically reduce the costs of our infrastructure and stay competitive by optimizing our implementations. To show how this is possible and what optimizations we can use in our everyday programmer life, let us have a look at the following imaginary task:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Task<\/h2>\n\n\n\n<p>Our example task is to calculate the average color of an image as fast as possible. The task is inspired by various applications, like Google Images, which contains many images and shows the average color of the underlying image, before all data is available on the device. This can be used to hide the latency and to give a more fluid feeling to the application. At first glance, there is nothing special about it but if we think about how many images are uploaded to modern large scale systems and how much computing power we would need to process them, things are getting pretty interesting. The calculation of the average color of an image gives a good example, where we can apply fundamental optimizations to our code. To compute the average color, we have to process every pixel of an image which can be a very expensive task, especially, if we have to deal with high resolution images. Unfortunately, for the developers who have to deal with these high resolution images, modern devices have high resolution cameras and displays which increase the computational costs. To get the correct average color, we would have to square the individual values, and calculate the average color based on the squared values. But in our case, it is enough to simply add the values from every color channel together without squaring them, and divide it by the number of pixels. This will give us a slightly darker average color which is no problem for our task and furthermore, we will get performance improvements based on this simplification, as we will find out in the following sections.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Choosing a Language <\/h2>\n\n\n\n<p style=\"text-align:left\">Today, many programming languages exist, each with its own advantages and disadvantages and it became a challenge to choose the perfect language for a given task. If we want to write high performant code, things are getting even more complicated because for every languages there are different optimization techniques and we can find different test cases, where they can beat other languages. To test which language is the right one for our purpose, I implemented the basic calculation of the average color with the languages I commonly use: C++, Java and Python with Numpy. Although, I already expected that the good old C++ will win this fight, I was surprised about the results:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1835\" height=\"445\" data-attachment-id=\"6039\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/languages\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Languages.png\" data-orig-size=\"1835,445\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Languages\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Languages-1024x248.png\" src=\"https:\/\/i2.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Languages.png?fit=656%2C159&amp;ssl=1\" alt=\"\" class=\"wp-image-6039\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Languages.png 1835w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Languages-300x73.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Languages-768x186.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Languages-1024x248.png 1024w\" sizes=\"auto, (max-width: 1835px) 100vw, 1835px\" \/><figcaption>Calculation time in % when calculating the average color value with different programming languages<\/figcaption><\/figure>\n\n\n\n<p>One thing should be noted, these results can only be seen as a basic overview about how much improvements we can get, if we choose different languages for specific tasks. These differences should not be transferred to a general comparison between the performances of these languages! The real improvements always depend on the specific task and in this case, the basic C++ implementation is already twice as fast as the Java solution and more than ten times faster than Python without any optimizations!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Know your Data<\/h2>\n\n\n\n<p> For further improvements, we have to take a look at our data, because then we can find ways to optimize our memory accesses. In most cases, 8 bit per color channel (Red, Green, Blue) is used to store our data. Additionally, an alpha value is stored as a fourth color channel to represent the transparence of our picture. So all in all, we have 4 color channels per pixel, each containing a number between 0 and 255. Our pixels are stored like the plain-PPM file format, where the RGBA values of all pixels are listed one after another. If we calculate one channel after another, we can get a low performance, based on inefficient memory access. If we use libraries, from which we do not exactly know how they store the data, we can easily have an inefficient memory access without even noticing. The imaginary used API could have stored our images in a different way, whereby the color channels are stored in four separate arrays. This could be useful in some cases, but if we now have a function to access a single pixel, we have created a memory inefficient access. Due to a memory inefficient access, where we calculate one channel after another, the calculation time increases drastically: <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1845\" height=\"443\" data-attachment-id=\"6040\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/badaccess\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/BadAccess.png\" data-orig-size=\"1845,443\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"BadAccess\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/BadAccess-1024x246.png\" src=\"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/BadAccess.png?fit=656%2C158&amp;ssl=1\" alt=\"\" class=\"wp-image-6040\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/BadAccess.png 1845w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/BadAccess-300x72.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/BadAccess-768x184.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/BadAccess-1024x246.png 1024w\" sizes=\"auto, (max-width: 1845px) 100vw, 1845px\" \/><figcaption> Calculation time in % with bad or efficient memory access<\/figcaption><\/figure>\n\n\n\n<p>\nWe, as programmers, are more familiar with Integer or Float data types, that are in most cases represented by 32 bits. We use them basically everywhere when we have enough available memory, even if we could use smaller data types. We do not care about the small decrease of our memory footprint, but the reduced memory consumption is not everything we can get from more suitable data types. Due to suitable data types, we can get additional performance improvements:\n\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1822\" height=\"451\" data-attachment-id=\"6041\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/datatypes\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Datatypes.png\" data-orig-size=\"1822,451\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Datatypes\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Datatypes-1024x253.png\" src=\"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Datatypes.png?fit=656%2C162&amp;ssl=1\" alt=\"\" class=\"wp-image-6041\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Datatypes.png 1822w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Datatypes-300x74.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Datatypes-768x190.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Datatypes-1024x253.png 1024w\" sizes=\"auto, (max-width: 1822px) 100vw, 1822px\" \/><figcaption> Calculation time in % with different datatypes<\/figcaption><\/figure>\n\n\n\n<p>Now our compiler has additional information about the data and can use more or even better optimizations. With this small change, we reduce the calculation time by <strong>more than 40%<\/strong> and this only by storing our data in a Char with 8 Bits instead of an Integer with 32 Bits!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Know your Hardware <\/h2>\n\n\n\n<p>\nIf we know our hardware, we can use further optimization techniques to get the best out of our system. Modern CPUs come with many cores and therefore, with huge performance gains. Additionally, they can use a technique that is called vectorization, whereby our hardware can make multiple calculations in fewer steps and, if this is not enough, we can also utilize the raw computing power of modern GPUs.\n\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Vectorization<\/h3>\n\n\n\n<p>\nVectorization uses special hardware registers, by which it is possible to make calculations faster. These registers are limited in size, often 128-Bit or 256-Bit, and we can fill them with our data. In our case, we add 4D vectors together. Normally, we have to make four additions, one for each element of the vector but if we use vectorization, this could be done in a single step. First, I implemented a basic SIMD (Single Instruction Multiple Data) vector calculation where we can add two 4D vectors, each stored in 128-Bit, together in a single step. But this simple approach increased rather than reduced the calculation time, how can this be? Our compiler does a great job in optimizing our code, whereby he already tries to use vectorization automatically! This is especially visible in the performance improvements we got by using 8 Bits to store our data, now, the compiler could detect this and could add more values together in a single step with automatic vectorization. It was not an easy task to implement a faster vectorization solution, but we can still get some improvements by using AVX2 (Advanced Vector Extensions) instructions with 256-Bit registers. We could store 32 8-Bit values in these registers but because we need more bits to store our sum, this representation is not enough. The next bigger data type would be 16-Bits where we can add 16 values each with 16 bits together in a single step. With 16 bits we can sum 256 values together if we do not square the values, without losing data and with this knowledge we can get again performance improvements:\n\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1842\" height=\"435\" data-attachment-id=\"6042\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/vectorization\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Vectorization.png\" data-orig-size=\"1842,435\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Vectorization\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Vectorization-1024x242.png\" src=\"https:\/\/i1.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Vectorization.png?fit=656%2C155&amp;ssl=1\" alt=\"\" class=\"wp-image-6042\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Vectorization.png 1842w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Vectorization-300x71.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Vectorization-768x181.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/Vectorization-1024x242.png 1024w\" sizes=\"auto, (max-width: 1842px) 100vw, 1842px\" \/><figcaption> Calculation time in % with vectorization techniques<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Multiprocessors<\/h3>\n\n\n\n<p> Modern CPUs are multiprocessors, to get performance gains by parallelization instead of the nearly impossible increase of clock rate. By using multiprocessors, we can distribute the work over multiple cores and can fully utilize the CPU. For our task, we use six threads corresponding to six hardware cores, where every thread calculates the average color of an individual image. Due to the fact that multiple threads do not access the same data, we are free of race conditions, which makes our life easier. With six hardware cores, we would expect that we also will be able to process six times more images, but starting and waiting for threads also consumes time, so that we end up with an <strong>4.5 times faster<\/strong> implementation than the single threaded version. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">GPUs<\/h3>\n\n\n\n<p> The next step to get more performance, is to use GPUs. GPUs are a great choice if it comes to raw calculation performance, based on their hardware architecture. To keep it simple, GPUs have way more cores than CPUs but GPU cores are more lightweight than CPU cores, which means we do not have thousand individual CPU cores running concurrently on a GPU. But if we are aware of our hardware architecture, they can be executed nearly concurrently and we can get huge performance improvements especially for calculation intensive tasks. Many programmers have not even touched GPU programming, but today it is quite easy to get good performance, even without heavy optimization or hardware knowledge. For our task, even a very simple and unoptimized <strong>OpenCL<\/strong> solution is better than our optimized multicore C++ implementation. We perform a simple parallel sum on our color vectors, by which we start as many GPU-Threads as we have pixels in our image. First, every GPU thread loads a single value into local memory, then we calculate the sum of 256 Elements and store the average color of these elements on our GPU. We can repeat this steps until we have the average color of our whole image, and that&#8217;s basically all we have to do on the GPU side to get a <strong>25% faster<\/strong> solution! Another advantage is, that GPUs often scale better with larger data as CPUs. This is very helpful for our high resolution images:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1522\" height=\"932\" data-attachment-id=\"6065\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/gpuvscpu\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/GPUvsCPU.png\" data-orig-size=\"1522,932\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"GPUvsCPU\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/GPUvsCPU-1024x627.png\" src=\"https:\/\/i2.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/GPUvsCPU.png?fit=656%2C402&amp;ssl=1\" alt=\"\" class=\"wp-image-6065\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/GPUvsCPU.png 1522w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/GPUvsCPU-300x184.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/GPUvsCPU-768x470.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/GPUvsCPU-1024x627.png 1024w\" sizes=\"auto, (max-width: 1522px) 100vw, 1522px\" \/><figcaption>Relative execution time on different resolutions<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion <\/h2>\n\n\n\n<p> The used programming language and suitable data types can heavily improve our performance without complicated optimizations. This is a simple way to write more performant code. Furthermore, we can integrate this easy changes in our everyday work to improve our implementations. If we want further optimizations, we get stuck in the endless space of possible techniques and hardware dependent optimizations, but even with common techniques, we can get great performance improvements. We can use vectorization, multicore CPUs and GPU programming to get the best out of our systems. Especially with GPU programming we can get great results, even without heavy optimization and furthermore, GPU programming became more easy in the past years and it is easy to adopt it in our systems. With this techniques, it was possible to<strong> reduce the calculation time<\/strong> of our example task<strong> to less than 1%<\/strong> compared to the simple Python implementation. It is always hard to talk about general optimization techniques, but I hope that the results of our imaginary task give some motivation and suggestions what can be achieved with modern hardware and optimized code: <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1804\" height=\"1088\" data-attachment-id=\"6043\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/03\/10\/writing-high-performance-code-on-modern-hardware\/fullcomparision\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/FullComparision.png\" data-orig-size=\"1804,1088\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"FullComparision\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/FullComparision-1024x618.png\" src=\"https:\/\/i1.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/FullComparision.png?fit=656%2C396&amp;ssl=1\" alt=\"\" class=\"wp-image-6043\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/FullComparision.png 1804w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/FullComparision-300x181.png 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/FullComparision-768x463.png 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/03\/FullComparision-1024x618.png 1024w\" sizes=\"auto, (max-width: 1804px) 100vw, 1804px\" \/><figcaption> Calculation time in % for all optimizations<\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Today, with the use of modern hardware combined with optimized high performant code, it is an easy task to process more than 500 million images per day on a single machine. Small improvements in the underlying implementations can have extreme large impacts on the execution time and are therefore fundamentally important to handle the huge [&hellip;]<\/p>\n","protected":false},"author":915,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1,651,2],"tags":[],"ppma_author":[782],"class_list":["post-6036","post","type-post","status-publish","format-standard","hentry","category-allgemein","category-system-designs","category-system-engineering"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":1333,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2016\/08\/13\/mirageos\/","url_meta":{"origin":6036,"position":0},"title":"MirageOS","author":"Simon Lipke","date":"13. August 2016","format":false,"excerpt":"Introduction MirageOS is a new and rising trend when it comes to talking about cloud computing. More and more services are being relocated into modern cloud infrastructures, due to a lot of advantages like i.e. reduced costs, maximum flexibility and high performance. Todays services normally depend on big virtual machines\u2026","rel":"","context":"In &quot;Secure Systems&quot;","block_context":{"text":"Secure Systems","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/secure-systems\/"},"img":{"alt_text":"mirage-header4","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/08\/mirage-header4.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/08\/mirage-header4.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/08\/mirage-header4.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2016\/08\/mirage-header4.jpg?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":10318,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2020\/04\/13\/open-source-batch-and-stream-processing-realtime-analysis-of-big-data\/","url_meta":{"origin":6036,"position":1},"title":"Open Source Batch and Stream Processing: Realtime Analysis of Big Data","author":"Marcel Stolin","date":"13. April 2020","format":false,"excerpt":"Abstract Since the beginning of Big Data, batch processing was the most popular choice for processing large amounts of generated data. These existing processing technologies are not suitable to process the large amount of data we face today. Research works developed a variety of technologies that focus on stream processing.\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/mapreduce.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/mapreduce.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/mapreduce.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":5262,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/02\/26\/reproducibility-in-ml\/","url_meta":{"origin":6036,"position":2},"title":"Reproducibility in Machine Learning","author":"Pascal Fecht","date":"26. February 2019","format":false,"excerpt":"The rise of Machine Learning has led to changes across all areas of computer science. From a very abstract point of view, heuristics are replaced by black-box machine-learning algorithms providing \"better results\". But how do we actually quantify better results? ML-based solutions tend to focus more on absolute performance improvements\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24936,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/07\/17\/security-knockout-how-capcoms-street-fighter-5-punched-a-hole-in-intels-security-system\/","url_meta":{"origin":6036,"position":3},"title":"Security Knockout: How Capcom&#8217;s Street Fighter 5 punched a hole in Intel&#8217;s security system","author":"Frederik Omlor","date":"17. July 2023","format":false,"excerpt":"Games are usually built in order to optimize performance, not security. Nevertheless, they can be responsible for security vulnerabilities as well. This article shows how anti-cheat software, cheaters themselves and finally also game developers can cause harm to users systems.","rel":"","context":"In &quot;Secure Systems&quot;","block_context":{"text":"Secure Systems","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/secure-systems\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/07\/StreetFighterV-1.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/07\/StreetFighterV-1.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/07\/StreetFighterV-1.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/07\/StreetFighterV-1.jpg?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/07\/StreetFighterV-1.jpg?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/07\/StreetFighterV-1.jpg?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":8808,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2019\/10\/13\/how-to-build-fault-tolerant-software-systems\/","url_meta":{"origin":6036,"position":4},"title":"How to build fault-tolerant software systems","author":"Raimondo Lazzara","date":"13. October 2019","format":false,"excerpt":"June 4th, 1996 - Ariane 5 rocket explodes a few seconds after being launched. The disaster was caused by a simple software error [1]. A brief introduction to the fundamental concepts of Erlang and Elixir Ever since the first electronic systems have been created, engineers and developers have strived to\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/10\/Spawn-Proc.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/10\/Spawn-Proc.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2019\/10\/Spawn-Proc.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":9816,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2020\/02\/24\/using-gitlab-to-set-up-a-ci-cd-workflow-for-an-android-app-from-scratch\/","url_meta":{"origin":6036,"position":5},"title":"Using Gitlab to set up a CI\/CD workflow for an Android App from scratch","author":"Johannes Mauthe","date":"24. February 2020","format":false,"excerpt":"Tim Landenberger (tl061) Johannes Mauthe (jm130) Maximilian Narr (mn066) This blog post aims to provide an overview about how to setup a decent CI\/CD workflow for an android app with the capabilities of Gitlab. The blog post has been written for Gitlab Ultimate. Nevertheless, most features are also available in\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/lh3.googleusercontent.com\/TILM-T31y5pbvWRvoZbA53hR9mLaqMjANXKq7iGX_j-c19K_uiVnmKVDZV9DHBnGdPMgFogHmaNvLSy9gguK5rkMVLlosa4YuvYQQy-d090w90UjqUX_MbwizDt6_zQ1BlT6TrJ5","width":350,"height":200},"classes":[]}],"jetpack_sharing_enabled":true,"authors":[{"term_id":782,"user_id":915,"is_guest":0,"slug":"vm042","display_name":"Vincent Musch","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/16fde5909a70fcba7e8ac00d2021b4cd6f1f03c773ef75a22f0755bcda7e435b?s=96&d=mm&r=g","0":null,"1":"","2":"","3":"","4":"","5":"","6":"","7":"","8":""}],"_links":{"self":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/6036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/users\/915"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/comments?post=6036"}],"version-history":[{"count":6,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/6036\/revisions"}],"predecessor-version":[{"id":6067,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/6036\/revisions\/6067"}],"wp:attachment":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/media?parent=6036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/categories?post=6036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/tags?post=6036"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/ppma_author?post=6036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}