Alt-Texting

Accessibility, Guided AI, Collage
Project Overview
Alt-Texting is a multi-media project that considers how different accessibility approaches have the power to change perception. Exploring different image to text APIs prior to the conception of DALLE, this work examines how people engaging with visual content solely through screen readers might interpret meaning.

This project was created in collaboration with friend, Sammi Hashash, who identifies as a blind man who is frustrated with the lack of interpretation of online imagery.
Hardware & Software
Google Vision API, Imagga API, Chat GPT4, DALLE, Analog Collage
A woman in a brown head wrap gazes at the camera donning a blue earing

Collage imagery sourced from API % confidence levels using only up-cycled materials from around NYC
What is Alt-Text?
'Alt text' is a short description of an image used when it can't be viewed. It enhances website accessibility, provides NSFW warnings, and improves SEO. The American Disabilities Act (ADA), legally mandates accessibility for disabled individuals using screen readers, captions, audio descriptions, and alt text. While this presents progress in creating more accessible online spaces, ADA regulations may overlook emotional or contextual nuances.
Production Process
Alt-Texting translates visual experiences for blind and visually impaired users using image-to-text APIs. Collaborating with Hashash, we selected well-known artwork featuring famous women. Two APIs, Google Vision and Imagga, analyzed the images for key features like color palette, faces, objects, and labels. Although image-to-text APIs cannot generate alt-text directly, they provide valuable information about the image content with a certain level of confidence.
Boticelli's work, Birth of Venus, depicting a nude venus on a half shell emerging from the foam surrounded by angelsThe nude color palette pulled by Google Vision API from "Birth of Venus"An image depicting the likelihood of the presence of certain objects in "Birth of Venus" pulled by google vision API. Natural Material, 72%, Wood 68%, Mythology 67%, Drawing 66%, Fictional Character 65%, Tree 65%An NSFW warning pulled by Google Vision API for "Birth of Venus." Shows that the image is "very likely" to be "racy."
Based on API interpretations, I created collages associating the API's confidence level with the frequency of elements in the artwork. For example, when Google Vision analyzed Botticelli's Birth of Aphrodite, it was 72% confident about "nature" elements and mentioned possible risqué imagery. My collages incorporated color palette, wood, water, and nudity. Multiple collages were made using found objects from the streets of New York City.
Collage of a woman in a red robe and head wrap against a green background, portrait.Collage of high fashion models in various positions surrounded by animals such as horses and dogs in a green and blue color palettePortrait of a woman gazing at the camera in a brown head wrap donning a single blue dangling earringA naked woman covers her breasts with her arms and is collaged on top of a series of men oriented towards her against blue water backgroundsA jesus figure is collaged with arms open and palms up on a table with a glass of red wine. His face is partially obscured and layered on top is a gold cross shape
Interpretation
Most effective alt-text must still be written by humans. While AI is rapidly evolving, as of this moment, no one system has the capabilities to interpret contextual nuance or convey meaning at a level comparable to human intelligence. Since the inception of this project, AI like DALLE has posed new opportunities to train computer learning to understand how various semantics might be translated to imagery. However, when working from images to text, this remains difficult.
Future Considerations
As a current expansion to this project, I trained ChatGPT4 to write a prompt for DALLE that would interpret API findings as a collage. Whereas before, I was personally interpreting how to incorporate API confidence levels with frequency of the identified subject, I was now able to instruct the AI to display visual elements according to the API's confidence that element was present, and use this percentage to represent their frequency in the collage.

DALLE-created collages based on Google Vision API interpretation and ChatGPT prompt