Artificial Intelligence Rates My Style

April 24, 2022

Video Transcript

Have you ever wondered how your style affects the way people perceive you? Sometimes, something as simple as a new hairstyle can make you look more intelligent, attractive, and trustworthy. But it's not always easy to find that style that suits you the best. I've always been a fan of trying new styles and experimenting with different looks, so I decided to do an experiment and use artificial intelligence to rate my pictures so that I could find out what styles make me look my best.

Over the years, I've had all sorts of hairstyles. Long hair, short hair, straight hair, curly hair, you name it. Some of them were good, and some of them were, well, pretty bad. It was not always obvious to me which was which until I looked back at old photos, because I didn't have an objective way to judge my own style, but now I do! Well, sort of…

Today, we will ask OpenAI's vision and language model, CLIP, to judge my pictures and rate them on eight different dimensions: perceived intelligence, trustworthiness, attractiveness, competence, likability, confidence, authenticity, and creativeness.

CLIP was trained on hundreds of millions of image and text pairs collected from a variety of publicly available sources on the Internet. So, in a way, asking CLIP for its opinion is like asking the Internet's opinion.

To get the internet’s opinion on all the styles I have ever had, I scanned my entire photo album for my pictures. I used face detection and recognition models to detect and crop my faces in the pictures. To my surprise, out of over 35,000 photos I had, only about 3000 of them contained my face. I guess I'm not really a selfie person. Or, maybe I looked so different that the face recognition model missed many of my pictures. Either way, 3000 pictures is more than enough to analyze.

I used OpenAI's vision and language model, CLIP, to judge each one of those photos for several perceived qualities, including intelligence, trustworthiness, and attractiveness.

The way CLIP works is that it maps images and text into a shared space where we can associate them with each other. For example, we can input two text prompts "a photo of a man with a great hairstyle" and "a photo of a man with a bad hairstyle", and then run the model for a set of images to see which ones are closer to the "great hairstyle" prompt. CLIP is a model that can do many things, from image classification to text-guided image generation, but for our purposes, we will use it to score my pictures for different qualities, by looking at how close or far away each image is from a set of positive and baseline prompts.

I wrote pairs of prompts like the example I showed before for intelligence, trustworthiness, attractiveness, and all the other categories. Obviously, these qualities cannot be inferred from a single image, but it's still interesting to see what CLIP thinks.

Alright, let's fast forward to the results!

Here are some of the best and worst rated pictures I had for the perceived intelligence category. It looks like the context matters. The top picture is from my graduation. In the second one, I have a company logo in the background. In the third one, I have Stanford University in the background.

The next one is trustworthiness and the results are not surprising! Who wouldn’t trust a bicyclist! The lowest rated pictures are from my rapper years. I guess the model is a bit biased against rappers, and the model apparently doesn’t like mirror selfies, especially if it’s a shirtless one.

Attractiveness is a bit tricky. This picture had one of the lowest trustworthiness scores yet it seems to be rated among the highest for attractiveness. The model seems to like beards but strongly dislikes soul patches.

Competence seems to be mostly about my age and the context.

Likability is next. Presentation and biking pictures again rank among the best. Smiling also seems to make a difference.

Confidence seems to be highly correlated with intelligence and competence except that the model is now ok with bathroom selfies.

As for authenticity, it seems like the model can tell the difference between a genuine smile and a fake one. And as always, it hates soul patches. Growing a soul patch was a terrible idea!

Finally, let’s take a look at creativeness. Looks like the type of glasses I wear and my hairstyle made the difference.

Since I have the date and time information for each photo, I was also able to analyze how these qualities have changed over time. After a bit of temporal smoothing, this is what I got.

It seems like my perceived intelligence has increased over the years, except for the summer of 2016, when I took a bunch of shirtless photos to keep track of my workout progress. Trustworthiness score also follows a similar pattern. Looks like the model thinks that guys who take shirtless pictures are not very trustworthy. On the other hand, the model also thinks that shirtless photos are actually attractive.

Competence and intelligence also seem to be highly correlated.

Likability seems to have a lower variance and doesn’t vary too much between pictures taken around the same time.

Looks like my confidence dipped when I was interviewing for my first full time job and has been stable since I finished school.

My creativity seems to be on a roller coaster. This was when I sold my first NFT art by the way.

There seems to be some negative correlation between some of these qualities, and some positive correlation between some of these, so it's hard to pick the absolute best picture. So I computed an overall score for each picture, by taking the geometric average of the scores for each quality, and these seem to be the overall best pictures according to CLIP.

Now I know what haircut to get next time.

I hope you found this fun and interesting. I would love to hear your thoughts on this experiment in the comments below. I think the results more or less made sense, but I would still not take it too seriously, since the model surely has its own biases and preferences.

Alright. Thanks for watching and see you next time.

Here's how to try it on your own photos:

Clone and install CLIP from this repo: https://github.com/openai/CLIP
Replace ["a diagram", "a dog", "a cat"] in the example in the readme file with your own queries.
Detect and crop faces in your photo album using a face detector of your choice (e.g. DLib's face detector http://dlib.net/). Make sure the crops are not too tight.
Use a face recognition (e.g. https://github.com/serengil/deepface) model to separate your photos from other people in your album.

If you don't have many photos, you can do steps 3 and 4 manually too.