{"id":24243,"date":"2023-03-03T23:25:52","date_gmt":"2023-03-03T22:25:52","guid":{"rendered":"https:\/\/blog.mi.hdm-stuttgart.de\/?p=24243"},"modified":"2023-06-26T23:22:39","modified_gmt":"2023-06-26T21:22:39","slug":"modern-application-of-voice-ai-technology","status":"publish","type":"post","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/","title":{"rendered":"Modern application of Voice AI technology"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full is-resized\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"24339\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/attachment\/01\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg\" data-orig-size=\"960,722\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"01\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg\" alt=\"Image of Voice AI header\" class=\"wp-image-24339\" width=\"720\" height=\"542\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg 960w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01-300x226.jpg 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01-768x578.jpg 768w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could generally tap into various databases to provide the aid and service that they were ask to provide. Although the popular use case seems to be primarily domestic, we have long since experienced other applications for Voice AI in areas such as UI speech control, voice recognition and replication and in use within entertainment media.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Faced with the ever-present but ever-growing interest in artificial intelligence which continue to further their influence on society, industry and the commercial landscape, my post will strive to demonstrate and inspect the technologies surrounding the application of Voice AI but also the concurrent obstacles it needs to overcome. However, before we can dive deeper into the topic, it is important to introduce the concept of Voice AI and its general technological workings.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h3 class=\"wp-block-heading\" id=\"aioseo-introduction-to-voice-ai\">About Voice AI<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The definition of Voice AI is not yet set in stone and it can vary based on the scope of application. What we can most likely agree on is that Voice AI is a component of conversational AI which the users can interact with. With a combination of machine learning and natural language processing, Voice AI can analyze language and speech in order to produce sound.<sup>[1]<\/sup> It can provide the user with a way to interact with the system in an auditory way as the system tries to interpret the human speech via voice recognition. In other cases the Voice AI technology can be used to generate human-like speech which is either pre-written by the user as a text-to-speech approach or as a result of trained AI that systematically responds with generated answers.<sup>[2]<\/sup> In either case, the Voice AI technology is bound to be a service that the users can make use of. This is especially true if we take into account the aspects of embedded system and enabling a hands-free user experience. For the purpose of keeping this blog post concise and on point, this post will demonstrate and inspect one application of Voice AI that is publicly available. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"aioseo-voice-ai-generator\">Voice AI Synthesis via VALL-E<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">AI TTS, also known as neural text-to-speech, makes use of neural networks and machine learning technologies. In a nutshell, the speech engine analyzes the audio input of a human voice via automatic speech recognition (ASR), and then try to understand the meaning of the words it has collected via natural-language generation (NLG). With the inclusion of neural networks, the artificial intelligence is able to learn the style of how people communicate with one another including response patterns and conversational flow. After the generation of the text that is intended to be converted into speech, a speech synthesis model is used to articulate and read the text.<sup>[3]<\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Right this moment the world is confronted with countless AI-generated voice-overs of popular figures which are flooding the internet. The infusion of AI and deep learning has revolutionized the TTS (text-to-speech) synthesis procedure to produce lifelike speech variations and severely decrease the &#8220;robotic&#8221;-like inflexion and pronunciation.<sup>[4]<\/sup> In a demonstration shown below, you will find a user-created conversation between two Voice AI models that were taught with voice samples of real media figures in order to perfectly mimic voice, inflexion and tone that is fitting for conversation:<\/p>\n\n\n\n<figure class=\"wp-block-video\"><video controls=\"\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/ai_tts.webm\" width=\"400\" height=\"400\"><\/video><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The act of simulating a voice and utilize that synthesized voice to speak a provided text can be termed as Voice Cloning. The key is that the system is capable of gaining knowledge via speaker encoder to learn the intricacies of the input voice data. The resulting Voice AI model is then capable of altering any given voice input waveform by decoding it.<sup>[5]<\/sup> In concern to the generation of a Voice AI model, Microsoft for example has released a TTS AI model called VALL-E that is essentially a neural codec language model which is capable of simulating a person&#8217;s voice after being trained with only a few seconds of audio samples. The main selling point of this particular TTS Voice AI model is its ability to preserve the speaker&#8217;s emotion and acoustic environment that was present in its audio samples. This feat is achieved through a combination of generative AI models like GPT-3 and coupled with speech editing and content creation and including 60.000 hours of English speech data.<sup>[6]<\/sup><\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02.jpg\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"24348\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/attachment\/02\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02.jpg\" data-orig-size=\"1047,564\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"02\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02-1024x552.jpg\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02-1024x552.jpg\" alt=\"Overview of VALL-E's synthesis pipeline\" class=\"wp-image-24348\" width=\"768\" height=\"414\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02-1024x552.jpg 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02-300x162.jpg 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02-768x414.jpg 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/02.jpg 1047w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><\/a><figcaption class=\"wp-element-caption\" style=\"font-size: 0.8em\"><strong>Figure 1<\/strong>: VALL-E&#8217;s pipeline consists of phoneme conversion and sampling in the voice input data, converting it into discrete code which is then decoded into a waveform.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A <a href=\"https:\/\/valle-demo.github.io\/\" target=\"_blank\" rel=\"noopener\" title=\"\">github page<\/a> was published to show the features of VALL-E&#8217;s advanced speech synthesis and speech generation methods for research demonstration purposes. These include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Synthesis of Diversity<\/strong>: Synthesizing sample with various and differing speech pattern<\/li>\n<\/ul>\n\n\n\n<table><tbody><tr><td>VALL-E Sample 1<figure class=\"wp-block-audio\"><audio controls=\"\" src=\"https:\/\/valle-demo.github.io\/audios\/vctk\/diversity\/p230_s1.wav\"><\/audio><\/figure><\/td><td>VALL-E Sample 2<figure class=\"wp-block-audio\"><audio controls=\"\" src=\"https:\/\/valle-demo.github.io\/audios\/vctk\/diversity\/p230_s2.wav\"><\/audio><\/figure><\/td><\/tr><\/tbody><\/table>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Acoustic Environment Maintenance<\/strong>: Preserving the acoustic environment of voice sample<\/li>\n<\/ul>\n\n\n\n<table><tbody><tr><td>Voice sample<figure class=\"wp-block-audio\"><audio controls=\"\" src=\"https:\/\/valle-demo.github.io\/audios\/fisher\/1_pt.wav\"><\/audio><\/figure><\/td><td>VALL-E sample<figure class=\"wp-block-audio\"><audio controls=\"\" src=\"https:\/\/valle-demo.github.io\/audios\/fisher\/1_ours.wav\"><\/audio><\/figure><\/td><\/tr><\/tbody><\/table>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Speaker&#8217;s Emotion Maintenance<\/strong>: Preserving tone and emotion of voice sample<\/li>\n<\/ul>\n\n\n\n<table><tbody><tr><td>Voice sample: Sleepy<figure class=\"wp-block-audio\"><audio controls=\"\" src=\"https:\/\/valle-demo.github.io\/audios\/emov_db\/sleepiness_pt.wav\"><\/audio><\/figure><\/td><td>VALL-E sample<figure class=\"wp-block-audio\"><audio controls=\"\" src=\"https:\/\/valle-demo.github.io\/audios\/emov_db\/sleepiness_ours.wav\"><\/audio><\/figure><\/td><\/tr><\/tbody><\/table>\n\n\n\n<p class=\"wp-block-paragraph\">The benefits of Voice AI cannot be understated. Especially in systems that heavily rely on (semi)-automated customer service, the application of Voice AI can bring tangible effects for process pipelining and cost reduction. That is especially true in regard to splitting work labor into repetitive tasks that an AI can be trained to do and tasks that requires human interaction. Voice AI does not only benefit the company which utilizes Voice AI for answering user inquiries, but it can also become a vital support for clients and customers who suffer from speech impairment. Even considering the use cases beyond customer support services, Voice AI can decrease the communication friction that occur during interaction between human and AI, therefore improving the user experience regardless of use case.<sup>[2]<\/sup><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Voice Waveform Alignment via VITS<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">VITS stands for <strong>V<\/strong>ariational <strong>I<\/strong>nference with Adversarial Learning for End-to-End <strong>T<\/strong>ext-to-<strong>S<\/strong>peech and I hope you can accept how this acronym came to be. VITS in purest definition is a TTS model that is capable of text-to-audio alignment by utilizing Monotonic Alignment Search (MAS) whereas other TTS models may require external alignment annotations.<sup>[7][8]<\/sup> The use of the latter may result in unwanted long utterances and out-of-domain text and may also cause missing or repeating words in the synthesized speech. Typically, a neural TTS model aligns their text and speech either via attention mechanism or by a combined use of a mapping encoder (text input \u2192 state) and a decoder (state \u2192 mel-spectogram \/ waveform). Although techniques and methods for these alignment processes were improved by utilizing both content and location sensitive attention, these models still suffer from the aforementioned issues.<sup>[9]<\/sup><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"519\" data-attachment-id=\"24457\" data-permalink=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/attachment\/03\/\" data-orig-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03.jpg\" data-orig-size=\"1316,667\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"03\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03-1024x519.jpg\" src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03-1024x519.jpg\" alt=\"\" class=\"wp-image-24457\" srcset=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03-1024x519.jpg 1024w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03-300x152.jpg 300w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03-768x389.jpg 768w, https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/03.jpg 1316w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption class=\"wp-element-caption\" style=\"font-size: 0.8em\"><strong>Figure 2<\/strong>: Side-by-side diagram depicting the training and inference procedure. This model can be regarded as a variational autoencoder (VAE) which makes use of Monotonic Alignment Search to predict flow-based stochastic duration.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Even though VITS is by far not the only model who tries to tackle the issues, it tries to combat the alignment issue by adversarial learning, Monotonic Alignment Search and the use of a variational autoencoder. Adversarial learning is a machine learning algorithm that is applied onto the waveform domain. To describe is swiftly, the adversarial training utilizes a discriminator that distinguishes between the decoder output and the ground truth waveform in order to calculate the reconstruction loss which in turn can be predicted, trained and aligned via a model architecture built out of a hierarchy of interdependent encoders.<sup>[8]<\/sup> Monotonic Alignment Search (MAS) is a method of searching the most probable monotonic alignment between the latent variable and the statistics of the prior distribution of input speech and text. In practical relation with VITS, MAS is capable of estimating the alignment that is necessary to maximize the variational lower bound which in turn is important for natural sounding and stable speech.<sup>[10]<\/sup> A variational autoencoder is a likelihood-based deep generative model and it is also termed as VAE. VITS&#8217;s usage of the VAE is in combination with MAS in order to remove the burden of being dependent on attention mechanisms and in exchange allow for simpler architecture of continuous normalizing flows that benefits the process of adversarial training.<sup>[8]<\/sup><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Applied VITS in Singing Voice Conversion (SVC)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Connoisseurs of Voice AI can create their own song covers by utilizing voice samples and a pre-trained Voice AI model. The subsequent Voice Cloning can align the pitch and inflexion of the source waveform and apply the discretely encoded voice data onto it. The effect is immediate and a use case for the public had been made. Several <a href=\"https:\/\/github.com\/innnky\/so-vits-svc\/blob\/32k\/Eng_docs.md\" target=\"_blank\" rel=\"noopener\" title=\"\">repositories <\/a>provide all the necessary means to create something of your own.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Example 1:<\/h4>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Speaker Prompt:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/static.miraheze.org\/bluearchivewiki\/2\/2e\/Kazusa_EventLobby_2.ogg\"><\/audio><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns has-medium-font-size is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Source Waveform:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column has-small-font-size is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<span class=\"embed-youtube\" style=\"text-align:center; display: block;\"><iframe loading=\"lazy\" class=\"youtube-player\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/fn4JAuCfHQA?version=3&#038;rel=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;fs=1&#038;hl=en-US&#038;autohide=2&#038;wmode=transparent\" allowfullscreen=\"true\" style=\"border:0;\" sandbox=\"allow-scripts allow-same-origin allow-popups allow-presentation allow-popups-to-escape-sandbox\"><\/iframe><\/span>\n<\/div><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Result:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/kazusa.mp3\"><\/audio><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Example 2:<\/h4>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Speaker Prompt:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/static.miraheze.org\/bluearchivewiki\/1\/16\/Noa_Season_Halloween_1.ogg\"><\/audio><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns has-medium-font-size is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Source Waveform:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"font-size:10px;flex-basis:66.66%\">\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<span class=\"embed-youtube\" style=\"text-align:center; display: block;\"><iframe loading=\"lazy\" class=\"youtube-player\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/uLAfaMtaT3o?version=3&#038;rel=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;fs=1&#038;hl=en-US&#038;autohide=2&#038;wmode=transparent\" allowfullscreen=\"true\" style=\"border:0;\" sandbox=\"allow-scripts allow-same-origin allow-popups allow-presentation allow-popups-to-escape-sandbox\"><\/iframe><\/span>\n<\/div><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Result:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/noa.mp3\"><\/audio><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Example 3:<\/h4>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Speaker Prompt:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/static.miraheze.org\/bluearchivewiki\/7\/79\/Yuuka_Cafe_Act_2.ogg\"><\/audio><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns has-medium-font-size is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Source Waveform:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column has-small-font-size is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<span class=\"embed-youtube\" style=\"text-align:center; display: block;\"><iframe loading=\"lazy\" class=\"youtube-player\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/KMFwbV__ctY?version=3&#038;rel=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;fs=1&#038;hl=en-US&#038;autohide=2&#038;wmode=transparent\" allowfullscreen=\"true\" style=\"border:0;\" sandbox=\"allow-scripts allow-same-origin allow-popups allow-presentation allow-popups-to-escape-sandbox\"><\/iframe><\/span>\n<\/div><\/figure>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-f56f613f wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<p class=\"has-text-align-right wp-block-paragraph\">Result:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/yuuka.mp3\"><\/audio><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Benefits and friction points<\/h2>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<span class=\"embed-youtube\" style=\"text-align:center; display: block;\"><iframe loading=\"lazy\" class=\"youtube-player\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/ZRIq8AGm9nY?version=3&#038;rel=1&#038;showsearch=0&#038;showinfo=1&#038;iv_load_policy=1&#038;fs=1&#038;hl=en-US&#038;autohide=2&#038;wmode=transparent\" allowfullscreen=\"true\" style=\"border:0;\" sandbox=\"allow-scripts allow-same-origin allow-popups allow-presentation allow-popups-to-escape-sandbox\"><\/iframe><\/span>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Everyone who had the pleasure of playing the latest Harry Potter video game must have come across the character creator and felt the voice options either lackluster or comedic, or both. The voice production department must have decided to record the lines for both genders once and then offer ways to pitch that voice for a shallowly illusion of variety. The result was far from ideal but it did save money and coordination time with external voice acting agencies. However, the robotic and unnaturally pitched sound quality caused harm to the game&#8217;s ability to immerse the player. If the application of Voice AI was more pronounced and the workflow stable and tested, then this would have resulted in a far more favorable impression and converted to a higher revenue due to budget savings on voice acting expenditure. Although it would still be required to record the base voice lines, Voice AI could be utilized afterwards to shape the voice to the player&#8217;s preferences. The domain of voice alterations could be treated the same way how a player would change their hairstyle or body shape.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Back in the day with car navigation assistance, the voices were often robotic and generic. Many modern video games also have options to change the voice of the game announcer. So-called announcer-packs and Voice DLCs (downloadable content) were sold to give an existing game more variety and a more personalized user experience. The emergence of Voice AI may very well revolutionize the way how games and media in general handle and treat voice resources. It does, however, not only have benefits to show for. Just like with deepfakes, the claim of identity theft and impersonation is no trifling matter, especially when Voice AI is capable of reproducing voice and emotional nature. How could someone protect the commodity that is their voice, given that a mere three second sample is sufficient in order to establish the Voice AI model?<sup>[11]<\/sup><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Lastly, the voice acting industry can expect to take a hit once the use and the workflow with using Voice AI has been stabilized. Especially small studio productions can barely compete with an ocean of synthesized options.<sup>[12]<\/sup> On the other hand, independent people can submit their own voice as something others have to pay for. Needless to say, this market would be yet another thing that emerged from Voice AI and is something that could spiral out of control.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Voice AI has proven to become the next big thing in revolutionizing user experience and bridging the communicative aspect of human-computer interaction. As with every technology, the intricacies of how the AI&#8217;s capability to learn and to adapt can lead to much better and usable performance in the coming months and years and it is only a matter of time before the industry fully shifts towards its undeniable benefits. The only thing that we need to look out for is to create proper guidelines and regulations and think on how to deal with the emerging push-and-pull effects that are inevitably introduced alongside Voice AI. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the end, Voice AI can only function as a benefactor of creative vision and a supporter of those who lack a voice, when it is used without ill intentions. Right now the flood gates are opened and everyone is encouraged to jump on the hype and play around with the new ways that Voice AI can spice up their everyday lives.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-c020569f wp-block-group-is-layout-flex\">\n<p class=\"wp-block-paragraph\">[1] IBM. <em>Conversational AI<\/em>. published on 02.01.23. <a href=\"https:\/\/www.ibm.com\/topics\/conversational-ai\">https:\/\/www.ibm.com\/topics\/conversational-ai<\/a> (last accessed: 27.02.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[2] LilChirp. <em>Voice AI: What is it and How Does it Work?<\/em> published on 22.11.22. <a href=\"https:\/\/lilchirp.io\/blog\/voice-ai\/\">https:\/\/lilchirp.io\/blog\/voice-ai\/<\/a> (last accessed: 27.02.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[3] Oliver Skinner. <em>Text to Speech Technology: How Voice Computing is Building a More Accessible World<\/em>. published on 09.06.20. <a href=\"https:\/\/www.voices.com\/blog\/text-to-speech-technology\/\">https:\/\/www.voices.com\/blog\/text-to-speech-technology\/<\/a> (last accessed: 27.02.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[4] Ethan Baker. <em>AI Evolution: The Future of Text-to-Speech Synthesis<\/em>. published on 16.02.23. <a href=\"https:\/\/www.veritonevoice.com\/blog\/future-of-text-to-speech-synthesis\/\">https:\/\/www.veritonevoice.com\/blog\/future-of-text-to-speech-synthesis\/<\/a> (last accessed: 27.02.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[5] George Seif. <em>You can now speak using someone else\u2019s voice with Deep Learning<\/em>. published on 02.07.19. <a href=\"https:\/\/towardsdatascience.com\/you-can-now-speak-using-someone-elses-voice-with-deep-learning-8be24368fa2b\">https:\/\/towardsdatascience.com\/you-can-now-speak-using-someone-elses-voice-with-deep-learning-8be24368fa2b<\/a> (last accessed: 27.02.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[6] Chengyi Wang, Sanyuan Chen, Yu Wu. <em>Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. published on<\/em> 05.01.2023. <a href=\"https:\/\/arxiv.org\/abs\/2301.02111\">https:\/\/arxiv.org\/abs\/2301.02111<\/a> (last accessed: 27.02.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[7] Coqui TTS Team. <em>VITS Documentation<\/em>. published on 14.08.21. <a href=\"https:\/\/tts.readthedocs.io\/en\/latest\/models\/vits.html\">https:\/\/tts.readthedocs.io\/en\/latest\/models\/vits.html<\/a> (last accessed: 01.03.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[8] Jaehyeon Kim, Jungil Kong, Juhee Son. <em>Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech<\/em>. published on 11.06.2021. <a href=\"https:\/\/arxiv.org\/abs\/2106.06103\">https:\/\/arxiv.org\/abs\/2106.06103<\/a> (last accessed: 01.03.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[9] Rohan Badlani, Adrian \u0141a \u0301ncucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro. <em>One TTS Alignment To Rule Them All<\/em>. published on 23.08.21. <a href=\"https:\/\/arxiv.org\/abs\/2108.10447\">https:\/\/arxiv.org\/abs\/2108.10447<\/a> (last accessed: 01.03.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[10] Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon. <em>Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search<\/em>. published on 22.05.20. <a href=\"https:\/\/arxiv.org\/abs\/2005.11129\">https:\/\/arxiv.org\/abs\/2005.11129<\/a> (last accessed: 01.03.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[11] Justin Carter. <em>Voice Actors Are Having Their Voices Stolen by AI<\/em>. published on 12.02.23. <a href=\"https:\/\/gizmodo.com\/voice-actors-ai-voices-controversy-1850105561\">https:\/\/gizmodo.com\/voice-actors-ai-voices-controversy-1850105561<\/a> (last accessed: 03.03.23)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">[12] Joseph Cox. <em>\u2018Disrespectful to the Craft:\u2019 Actors Say They\u2019re Being Asked to Sign Away Their Voice to AI<\/em>. published on 07.02.23. <a href=\"https:\/\/www.vice.com\/en\/article\/5d37za\/voice-actors-sign-away-rights-to-artificial-intelligence\">https:\/\/www.vice.com\/en\/article\/5d37za\/voice-actors-sign-away-rights-to-artificial-intelligence<\/a> (last accessed: 03.03.23)<\/p>\n<\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Image and figure sources<\/h3>\n\n\n\n<div class=\"wp-block-group is-vertical is-layout-flex wp-container-core-group-is-layout-c020569f wp-block-group-is-layout-flex\">\n<p class=\"wp-block-paragraph\">Banner image: https:\/\/www.intelligentliving.co\/ai-can-preserve-persons-voice-few-hours-recordings\/<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 1: https:\/\/valle-demo.github.io\/<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 2: https:\/\/arxiv.org\/abs\/2106.06103<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"> <\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could [&hellip;]<\/p>\n","protected":false},"author":1128,"featured_media":24339,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1,652,660],"tags":[355,57],"ppma_author":[899],"class_list":["post-24243","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-allgemein","category-artificial-intelligence","category-chatgpt-and-language-models","tag-ai","tag-machine-learning"],"aioseo_notices":[],"aioseo_head":"\n\t\t<!-- All in One SEO 4.9.9 - aioseo.com -->\n\t<meta name=\"description\" content=\"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could\" \/>\n\t<meta name=\"robots\" content=\"max-image-preview:large\" \/>\n\t<meta name=\"author\" content=\"Ngoc Ton\"\/>\n\t<meta name=\"keywords\" content=\"ai,machine learning\" \/>\n\t<link rel=\"canonical\" href=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/\" \/>\n\t<meta name=\"generator\" content=\"All in One SEO (AIOSEO) 4.9.9\" \/>\n\t\t<meta property=\"og:locale\" content=\"en_US\" \/>\n\t\t<meta property=\"og:site_name\" content=\"Computer Science Blog\" \/>\n\t\t<meta property=\"og:type\" content=\"article\" \/>\n\t\t<meta property=\"og:title\" content=\"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart\" \/>\n\t\t<meta property=\"og:description\" content=\"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could\" \/>\n\t\t<meta property=\"og:url\" content=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/\" \/>\n\t\t<meta property=\"og:image\" content=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg\" \/>\n\t\t<meta property=\"og:image:secure_url\" content=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg\" \/>\n\t\t<meta property=\"og:image:width\" content=\"960\" \/>\n\t\t<meta property=\"og:image:height\" content=\"722\" \/>\n\t\t<meta property=\"article:published_time\" content=\"2023-03-03T22:25:52+00:00\" \/>\n\t\t<meta property=\"article:modified_time\" content=\"2023-06-26T21:22:39+00:00\" \/>\n\t\t<meta name=\"twitter:card\" content=\"summary\" \/>\n\t\t<meta name=\"twitter:title\" content=\"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart\" \/>\n\t\t<meta name=\"twitter:description\" content=\"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could\" \/>\n\t\t<meta name=\"twitter:image\" content=\"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg\" \/>\n\t\t<script type=\"application\/ld+json\" class=\"aioseo-schema\">\n\t\t\t{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#article\",\"name\":\"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart\",\"headline\":\"Modern application of Voice AI technology\",\"author\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/author\\\/ngoc_ton\\\/#author\"},\"publisher\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/#organization\"},\"image\":{\"@type\":\"ImageObject\",\"url\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/wp-content\\\/uploads\\\/2023\\\/03\\\/01.jpg\",\"width\":960,\"height\":722},\"datePublished\":\"2023-03-03T23:25:52+01:00\",\"dateModified\":\"2023-06-26T23:22:39+02:00\",\"inLanguage\":\"en-US\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#webpage\"},\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#webpage\"},\"articleSection\":\"Allgemein, Artificial Intelligence, ChatGPT and Language Models, AI, machine learning, Ngoc Ton\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#breadcrumblist\",\"itemListElement\":[{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de#listItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/#listItem\",\"name\":\"Artificial Intelligence\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/#listItem\",\"position\":2,\"name\":\"Artificial Intelligence\",\"item\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/chatgpt-and-language-models\\\/#listItem\",\"name\":\"ChatGPT and Language Models\"},\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de#listItem\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/chatgpt-and-language-models\\\/#listItem\",\"position\":3,\"name\":\"ChatGPT and Language Models\",\"item\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/chatgpt-and-language-models\\\/\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#listItem\",\"name\":\"Modern application of Voice AI technology\"},\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/#listItem\",\"name\":\"Artificial Intelligence\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#listItem\",\"position\":4,\"name\":\"Modern application of Voice AI technology\",\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/category\\\/artificial-intelligence\\\/chatgpt-and-language-models\\\/#listItem\",\"name\":\"ChatGPT and Language Models\"}}]},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/#organization\",\"name\":\"Computer Science Blog @ HdM Stuttgart\",\"description\":\"on computer science and media topics\",\"url\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/author\\\/ngoc_ton\\\/#author\",\"url\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/author\\\/ngoc_ton\\\/\",\"name\":\"Ngoc Ton\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#authorImage\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/a051196441c601c26dc341dc77aa5fae3ff2e587dd3041acce91ac135fb04642?s=96&d=mm&r=g\",\"width\":96,\"height\":96,\"caption\":\"Ngoc Ton\"}},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#webpage\",\"url\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/\",\"name\":\"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart\",\"description\":\"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could\",\"inLanguage\":\"en-US\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/#website\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#breadcrumblist\"},\"author\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/author\\\/ngoc_ton\\\/#author\"},\"creator\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/author\\\/ngoc_ton\\\/#author\"},\"image\":{\"@type\":\"ImageObject\",\"url\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/wp-content\\\/uploads\\\/2023\\\/03\\\/01.jpg\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#mainImage\",\"width\":960,\"height\":722},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/index.php\\\/2023\\\/03\\\/03\\\/modern-application-of-voice-ai-technology\\\/#mainImage\"},\"datePublished\":\"2023-03-03T23:25:52+01:00\",\"dateModified\":\"2023-06-26T23:22:39+02:00\"},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/#website\",\"url\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/\",\"name\":\"Computer Science Blog @ HdM Stuttgart\",\"description\":\"on computer science and media topics\",\"inLanguage\":\"en-US\",\"publisher\":{\"@id\":\"https:\\\/\\\/blog.mi.hdm-stuttgart.de\\\/#organization\"}}]}\n\t\t<\/script>\n\t\t<!-- All in One SEO -->\n\n","aioseo_head_json":{"title":"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart","description":"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could","canonical_url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/","robots":"max-image-preview:large","keywords":"ai,machine learning","webmasterTools":{"miscellaneous":""},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#article","name":"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart","headline":"Modern application of Voice AI technology","author":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/author\/ngoc_ton\/#author"},"publisher":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/#organization"},"image":{"@type":"ImageObject","url":"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg","width":960,"height":722},"datePublished":"2023-03-03T23:25:52+01:00","dateModified":"2023-06-26T23:22:39+02:00","inLanguage":"en-US","mainEntityOfPage":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#webpage"},"isPartOf":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#webpage"},"articleSection":"Allgemein, Artificial Intelligence, ChatGPT and Language Models, AI, machine learning, Ngoc Ton"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#breadcrumblist","itemListElement":[{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de#listItem","position":1,"name":"Home","item":"https:\/\/blog.mi.hdm-stuttgart.de","nextItem":{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/#listItem","name":"Artificial Intelligence"}},{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/#listItem","position":2,"name":"Artificial Intelligence","item":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/","nextItem":{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/chatgpt-and-language-models\/#listItem","name":"ChatGPT and Language Models"},"previousItem":{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de#listItem","name":"Home"}},{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/chatgpt-and-language-models\/#listItem","position":3,"name":"ChatGPT and Language Models","item":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/chatgpt-and-language-models\/","nextItem":{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#listItem","name":"Modern application of Voice AI technology"},"previousItem":{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/#listItem","name":"Artificial Intelligence"}},{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#listItem","position":4,"name":"Modern application of Voice AI technology","previousItem":{"@type":"ListItem","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/chatgpt-and-language-models\/#listItem","name":"ChatGPT and Language Models"}}]},{"@type":"Organization","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/#organization","name":"Computer Science Blog @ HdM Stuttgart","description":"on computer science and media topics","url":"https:\/\/blog.mi.hdm-stuttgart.de\/"},{"@type":"Person","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/author\/ngoc_ton\/#author","url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/author\/ngoc_ton\/","name":"Ngoc Ton","image":{"@type":"ImageObject","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#authorImage","url":"https:\/\/secure.gravatar.com\/avatar\/a051196441c601c26dc341dc77aa5fae3ff2e587dd3041acce91ac135fb04642?s=96&d=mm&r=g","width":96,"height":96,"caption":"Ngoc Ton"}},{"@type":"WebPage","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#webpage","url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/","name":"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart","description":"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could","inLanguage":"en-US","isPartOf":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/#website"},"breadcrumb":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#breadcrumblist"},"author":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/author\/ngoc_ton\/#author"},"creator":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/author\/ngoc_ton\/#author"},"image":{"@type":"ImageObject","url":"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#mainImage","width":960,"height":722},"primaryImageOfPage":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/#mainImage"},"datePublished":"2023-03-03T23:25:52+01:00","dateModified":"2023-06-26T23:22:39+02:00"},{"@type":"WebSite","@id":"https:\/\/blog.mi.hdm-stuttgart.de\/#website","url":"https:\/\/blog.mi.hdm-stuttgart.de\/","name":"Computer Science Blog @ HdM Stuttgart","description":"on computer science and media topics","inLanguage":"en-US","publisher":{"@id":"https:\/\/blog.mi.hdm-stuttgart.de\/#organization"}}]},"og:locale":"en_US","og:site_name":"Computer Science Blog","og:type":"article","og:title":"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart","og:description":"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could","og:url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/","og:image":"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg","og:image:secure_url":"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg","og:image:width":960,"og:image:height":722,"article:published_time":"2023-03-03T22:25:52+00:00","article:modified_time":"2023-06-26T21:22:39+00:00","twitter:card":"summary","twitter:title":"Modern application of Voice AI technology | Computer Science Blog @ HdM Stuttgart","twitter:description":"With the advancement of technology and the gradually increasing use of artificial intelligence, new markets are developed. One of such is the market of Voice AI which became a commercial success with voice bots such as Alexa or Siri. They were mainly used as digital assistants who could answer questions, set reminders and they could","twitter:image":"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg"},"aioseo_meta_data":{"post_id":"24243","title":null,"description":null,"keywords":[],"keyphrases":{"focus":{"keyphrase":"","score":0,"analysis":{"keyphraseInTitle":{"score":0,"maxScore":9,"error":1}}},"additional":[]},"primary_term":null,"canonical_url":null,"og_title":null,"og_description":null,"og_object_type":"default","og_image_type":"default","og_image_url":null,"og_image_width":null,"og_image_height":null,"og_image_custom_url":null,"og_image_custom_fields":null,"og_video":"","og_custom_url":null,"og_article_section":null,"og_article_tags":[],"twitter_use_og":false,"twitter_card":"default","twitter_image_type":"default","twitter_image_url":null,"twitter_image_custom_url":null,"twitter_image_custom_fields":null,"twitter_title":null,"twitter_description":null,"schema":{"blockGraphs":[],"customGraphs":[],"default":{"data":{"Article":[],"Course":[],"Dataset":[],"FAQPage":[],"Movie":[],"Person":[],"Product":[],"ProductReview":[],"Car":[],"Recipe":[],"Service":[],"SoftwareApplication":[],"WebPage":[]},"graphName":"Article","isEnabled":true},"graphs":[]},"schema_type":"default","schema_type_options":null,"pillar_content":false,"robots_default":true,"robots_noindex":false,"robots_noarchive":false,"robots_nosnippet":false,"robots_nofollow":false,"robots_noimageindex":false,"robots_noodp":false,"robots_notranslate":false,"robots_max_snippet":"-1","robots_max_videopreview":"-1","robots_max_imagepreview":"large","priority":null,"frequency":"default","local_seo":null,"breadcrumb_settings":null,"limit_modified_date":false,"ai":null,"created":"2023-02-26 23:25:20","updated":"2023-06-26 21:22:39","seo_analyzer_scan_date":null},"aioseo_breadcrumb":"<div class=\"aioseo-breadcrumbs\"><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/blog.mi.hdm-stuttgart.de\" title=\"Home\">Home<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/\" title=\"Artificial Intelligence\">Artificial Intelligence<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/chatgpt-and-language-models\/\" title=\"ChatGPT and Language Models\">ChatGPT and Language Models<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\tModern application of Voice AI technology\n\t\t<\/span><\/div>","aioseo_breadcrumb_json":[{"label":"Home","link":"https:\/\/blog.mi.hdm-stuttgart.de"},{"label":"Artificial Intelligence","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/"},{"label":"ChatGPT and Language Models","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/chatgpt-and-language-models\/"},{"label":"Modern application of Voice AI technology","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2023\/03\/03\/modern-application-of-voice-ai-technology\/"}],"jetpack_featured_media_url":"https:\/\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/03\/01.jpg","jetpack-related-posts":[{"id":4024,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/08\/22\/why-ai-is-a-threat-for-our-digital-security\/","url_meta":{"origin":24243,"position":0},"title":"Why AI is a Threat for our Digital Security","author":"Katharina Strecker","date":"22. August 2018","format":false,"excerpt":"Artificial intelligence has a great potential to improve many areas of our lives in the future. But what happens when these AI technologies are used maliciously? Sure, a big topic may be autonomous weapons or so called \u201ckiller robots\u201d. But beside our physical security - what about our digital one?\u2026","rel":"","context":"In &quot;Artificial Intelligence&quot;","block_context":{"text":"Artificial Intelligence","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/"},"img":{"alt_text":"Computer image recognition has beaten human-level image recognition in 2015","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/08\/human-level-image-recongition-1024x717.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/08\/human-level-image-recongition-1024x717.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2018\/08\/human-level-image-recongition-1024x717.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":10415,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2020\/08\/19\/ai-cyberattacks-deepfakes\/","url_meta":{"origin":24243,"position":1},"title":"The Dark Side of AI &#8211; Part 1: Cyberattacks and Deepfakes","author":"Micha Christ","date":"19. August 2020","format":false,"excerpt":"Introduction Who hasn't seen a cinema production in which an AI-based robot threatens individual people or the entire human race? It is in the stars when or if such a technology can really be developed. With this series of blog entries we want to point out that AI does not\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2020\/08\/screen-shot-2018-08-03-at-10-34-32-1.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2020\/08\/screen-shot-2018-08-03-at-10-34-32-1.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2020\/08\/screen-shot-2018-08-03-at-10-34-32-1.jpg?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":2615,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/08\/27\/fooling-the-intelligence\/","url_meta":{"origin":24243,"position":2},"title":"FOOLING THE INTELLIGENCE","author":"Jonas Miederer","date":"27. August 2017","format":false,"excerpt":"Adversarial machine learning and its dangers The world is led by machines, humans are subjected to the robot\u2019s rule. Omniscient computer systems hold the control of the world. The newest technology has outpaced human knowledge, while the mankind is powerless in the face of the stronger, faster, better and almighty\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/08\/AAEAAQAAAAAAAAxmAAAAJDcyNzkyZjkzLTUzZTEtNGU1ZS04OWYxLWU4NDU5Y2QxOTRjYQ.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/08\/AAEAAQAAAAAAAAxmAAAAJDcyNzkyZjkzLTUzZTEtNGU1ZS04OWYxLWU4NDU5Y2QxOTRjYQ.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/08\/AAEAAQAAAAAAAAxmAAAAJDcyNzkyZjkzLTUzZTEtNGU1ZS04OWYxLWU4NDU5Y2QxOTRjYQ.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/08\/AAEAAQAAAAAAAAxmAAAAJDcyNzkyZjkzLTUzZTEtNGU1ZS04OWYxLWU4NDU5Y2QxOTRjYQ.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/08\/AAEAAQAAAAAAAAxmAAAAJDcyNzkyZjkzLTUzZTEtNGU1ZS04OWYxLWU4NDU5Y2QxOTRjYQ.png?resize=1050%2C600&ssl=1 3x"},"classes":[]},{"id":3803,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2018\/09\/12\/autonomous-war-what-dangers-are-associated-with-warfare-without-human-intervention\/","url_meta":{"origin":24243,"position":3},"title":"Autonomous War &#8211; Which dangers are associated with warfare without human intervention?","author":"Alexander Sch\u00fcbl","date":"12. September 2018","format":false,"excerpt":"The term autonomous war has been a controversial topic for years. But what exactly does the term actually mean? Autonomous war means the use of autonomous lethal weapons (short: LAWs) and machines or vehicles, which are primarily used by the military for modern warfare. Autonomous weapon systems can decide independently\u2026","rel":"","context":"In &quot;Secure Systems&quot;","block_context":{"text":"Secure Systems","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/system-designs\/secure-systems\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/samsung.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/samsung.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2023\/08\/samsung.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":27711,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2025\/07\/08\/open-source-in-ai-principles-pitfalls-and-practicalities-for-enterprise-adoption\/","url_meta":{"origin":24243,"position":4},"title":"Open Source in AI: Principles, Pitfalls, and Practicalities for Enterprise Adoption","author":"Tobias Metzger","date":"8. July 2025","format":false,"excerpt":"The rapid rise of open-source AI models is transforming enterprise innovation, but the true meaning of \u201copenness\u201d is often unclear. This article explores the principles behind open-source AI, examines key language models, and highlights the legal and operational challenges businesses face\u2014including the growing risk of \u201copen-washing.\u201d","rel":"","context":"In &quot;Artificial Intelligence&quot;","block_context":{"text":"Artificial Intelligence","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/artificial-intelligence\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/07\/llm_slm_param_comp.jpg?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/07\/llm_slm_param_comp.jpg?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/07\/llm_slm_param_comp.jpg?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/07\/llm_slm_param_comp.jpg?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/07\/llm_slm_param_comp.jpg?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2025\/07\/llm_slm_param_comp.jpg?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":3140,"url":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/2017\/09\/07\/aira-voice-assistant-a-proof-of-concept-in-virtual-reality\/","url_meta":{"origin":24243,"position":5},"title":"AIRA Voice Assistant \u2013 A proof of Concept in virtual reality","author":"Dominic Kossinna","date":"7. September 2017","format":false,"excerpt":"Motivation As part of the lecture \u201cSoftware Development for Cloud Computing\u201d we were looking for a solution, how a user can get basic assistance within our existing virtual reality game AIRA. The primary objective was a maximum of user-friendliness, while avoiding an interruption of the immersive gaming experience. It is\u2026","rel":"","context":"In &quot;Allgemein&quot;","block_context":{"text":"Allgemein","link":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/category\/allgemein\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.mi.hdm-stuttgart.de\/wp-content\/uploads\/2017\/09\/aira_error.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"jetpack_sharing_enabled":true,"authors":[{"term_id":899,"user_id":1128,"is_guest":0,"slug":"ngoc_ton","display_name":"Ngoc Ton","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/a051196441c601c26dc341dc77aa5fae3ff2e587dd3041acce91ac135fb04642?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Ton","first_name":"Ngoc","job_title":"","description":""}],"_links":{"self":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/24243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/users\/1128"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/comments?post=24243"}],"version-history":[{"count":58,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/24243\/revisions"}],"predecessor-version":[{"id":24853,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/posts\/24243\/revisions\/24853"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/media\/24339"}],"wp:attachment":[{"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/media?parent=24243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/categories?post=24243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/tags?post=24243"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mi.hdm-stuttgart.de\/index.php\/wp-json\/wp\/v2\/ppma_author?post=24243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}