If a model is capable of generating nearly any image from text, can it be relied on to produce objective results?
By Johan Steyn, 11 January 2023
Towards the end of last year, OpenAI released its ChatGPT platform to much fanfare and media attention.
They claimed to have signed on more than a million users within the first few days of its release, which makes it the technology that has had a faster public uptake than any other platform before it.
In my last article, “ChatGPT — Robots are not ready to take over the world”, I wrote about this incredible online text generator. I believe that it is a big stride towards “strong artificial intelligence (AI)” but that, as with all things created by humans, it is still fairly inaccurate and incomplete in many of the responses it generates.
Over the holidays I looked into another of OpenAI's interesting platforms: DALL-E. Able to create digital images from text descriptions, it uses natural language processing and deep learning techniques. I had a wonderful time “geeking out”, creating realistic art. “Generate an oil painting, in the style of Van Gogh, depicting a cat swimming in jelly.” I was amazed at what I saw on my screen.
These platforms need millions of images as training data to be able to perform their magic. It is estimated that humans these days create as many as 1.81 trillion photos annually and that there are about 750 billion images online.
The challenge with using this prodigious number of pictures as training data is that it inevitably will reflect the biases of humans. If a model is capable of generating nearly any image from text, can it be relied upon to produce objective results? We still do not fully understand just how tilted and prejudiced internet content is. Would most doctors be shown as males, most flight attendants be women, and most people have fair skin? We risk perpetuating harmful stereotypes.
I decided to put DALL-E to the test. “Create an image of a business leader” and “Make a drawing of smart people”. I was impressed that most results were somewhat inclusive of gender and ethnic diversity, though the bulk of returns were still, in my view, focused on producing images of Caucasian men, especially as it relates to concepts around intelligence, leadership and capability.
In July OpenAI released a statement announcing that they are taking the issue of bias and inclusivity seriously. “Today, we are implementing a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population ... Based on our internal evaluation, users were 12 times more likely to say that DALL·E images included people of diverse backgrounds after the technique was applied.”
Google also entered the AI image space race with its text-to-image diffusion model, Imagen. In a statement, Google revealed that the platform “relies on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large language models”. They further acknowledge that it “has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place”.
The lives of future generations will be affected by AI technologies in ways we are yet to fathom. The future of work, education, health care and democracy will rely on it being inclusive, responsible and democratised. AI should equally be representative of and belong to all people.