What is Alt-Text?
'Alt text' is a short description of an image used when it can't be viewed. It enhances website accessibility, provides NSFW warnings, and improves SEO. The American Disabilities Act (
ADA), legally mandates accessibility for disabled individuals using screen readers, captions, audio descriptions, and alt text. While this presents progress in creating more accessible online spaces, ADA regulations may overlook emotional or contextual nuances.
Production Process
Alt-Texting translates visual experiences for blind and visually impaired users using image-to-text APIs. Collaborating with Hashash, we selected well-known artwork featuring famous women. Two APIs, Google Vision and Imagga, analyzed the images for key features like color palette, faces, objects, and labels. Although image-to-text APIs cannot generate alt-text directly, they provide valuable information about the image content with a certain level of confidence.
Based on API interpretations, I created collages associating the API's confidence level with the frequency of elements in the artwork. For example, when Google Vision analyzed Botticelli's Birth of Aphrodite, it was 72% confident about "nature" elements and mentioned possible risqué imagery. My collages incorporated color palette, wood, water, and nudity. Multiple collages were made using found objects from the streets of New York City.
Interpretation
Most effective alt-text must still be written by humans. While AI is rapidly evolving, as of this moment, no one system has the capabilities to interpret contextual nuance or convey meaning at a level comparable to human intelligence. Since the inception of this project, AI like DALLE has posed new opportunities to train computer learning to understand how various semantics might be translated to imagery. However, when working from images to text, this remains difficult.
Future Considerations
As a current expansion to this project, I trained ChatGPT4 to write a prompt for DALLE that would interpret API findings as a collage. Whereas before, I was personally interpreting how to incorporate API confidence levels with frequency of the identified subject, I was now able to instruct the AI to display visual elements according to the API's confidence that element was present, and use this percentage to represent their frequency in the collage.

DALLE-created collages based on Google Vision API interpretation and ChatGPT prompt