September 26 – The popular chatbot known as ChatGPT can now “see, hear and speak,” or at least process spoken words and respond with a synthetic voice. It can also process images, according to parent company OpenAI.
ChatGPT’s feature push
The features rollout has occurred as competition among chatbot developers heats up. Google has announced a variety of features updates for its Bard chatbot, and Microsoft has recently added visual search to Bing.
Cyber security experts have expressed concern around the AI-generated synthetic voices, which could result in more convincing deepfakes. Experts and cyber criminals alike have already begun to unravel how deepfakes can be used to circumvent cyber defenses.
Consumer voice inputs
OpenAI has acknowledged this issue and said that the synthetic voices were created with voice actors. They’re not voices collected from strangers.
How OpenAI will use and store consumer voice inputs largely remains unknown. OpenAI’s formal written guidance on voice interactions states that OpenAI does not retain audio clips and that the recordings are not used to improve models.
However, the company also states that transcriptions are considered ‘inputs’ and that the transcriptions could be leveraged for large language model training.