Making an AI to make beautiful Art using GPT-3, CLIP and the CC12M Dataset.
Making an AI to make beautiful Art using GPT-3, CLIP and the CC12M Dataset.
So, my best friend is very talented in drawing and making art in over all. So, last time we were talking about a competition he had. He had to make two drawings for the topic utopia and dystrophia. Then I suddenly thought lets make an AI for this and as result here we are.
I started doing some research and found out openAI had an model named DALL E that can images based on text. So, I read the research paper and found out it works with something called GPT-3 and CLIP.
GPT-3 is one of the most advanced language processing model ever made.
CLIP is one of the most advanced language to image synthesis model ever made
Now I had the basics on how I will Design my AI but I needed more reference and then while researching I found out about “rivers have wings” a very talented coder who makes amazing models for art genration. I saw her using the CC12M dataset and with that I had everything ready and my research was a success.
So, Let's get started.
This project was made on google colab so first enable GPU in your runtime.
Then check your GPU:
# Check the GPU | |
!mkdir steps | |
!nvidia-smi |
Now lets start installing the dependencies:
# Install dependencies | |
!pip install ftfy regex requests tqdm | |
!git clone --recursive https://github.com/crowsonkb/v-diffusion-pytorch |
Lets download the diffusion model:
# Download the diffusion model | |
# SHA-256: 4fc95ee1b3205a3f7422a07746383776e1dbc367eaf06a5b658ad351e77b7bda | |
!mkdir v-diffusion-pytorch/checkpoints | |
!curl -L https://v-diffusion.s3.us-west-2.amazonaws.com/cc12m_1_cfg.pth > v-diffusion-pytorch/checkpoints/cc12m_1_cfg.pth |
Now for the imports:
# Imports | |
import gc | |
import math | |
import sys | |
from IPython import display | |
import torch | |
from torchvision import utils as tv_utils | |
from torchvision.transforms import functional as TF | |
from tqdm.notebook import trange, tqdm | |
sys.path.append('/content/v-diffusion-pytorch') | |
from CLIP import clip | |
from diffusion import get_model, sampling, utils |
Getting the data files of the diffusion model:
%pip install -q requests tqdm ftfy | |
%cd /content | |
!git clone --recursive https://github.com/crowsonkb/v-diffusion-pytorch.git | |
%cd /content/v-diffusion-pytorch | |
%mkdir -p checkpoints | |
!curl https://v-diffusion.s3.us-west-2.amazonaws.com/cc12m_1_cfg.pth -o checkpoints/cc12m_1_cfg.pth | |
!./cfg_sample.py "the rise of consciousness":5 -n 4 -bs 4 --seed 0 |
Lets start loading the models now:
# Load the models | |
model = get_model('cc12m_1_cfg')() | |
_, side_y, side_x = model.shape | |
model.load_state_dict(torch.load('/content/v-diffusion-pytorch/checkpoints/cc12m_1_cfg.pth', map_location='cpu')) | |
model = model.half().cuda().eval().requires_grad_(False) | |
clip_model = clip.load(model.clip_model, jit=False, device='cpu')[0] |
Now lets take the inputs from the user:
#Only these heights and widths work: 4,8,16,32,64,128,256,512 | |
#@title Settings | |
import random | |
prompt = 'Tower of AI, Digital Art' #@param {type:"string"} | |
height = 512#@param {type:"integer"} | |
width = 512#@param {type:"integer"} | |
side_x = width | |
side_y = height | |
#@markdown Specify the number of diffusion timesteps (default is 50, can lower for faster but lower quality sampling). | |
steps = 1000#@param {type:"integer"} | |
#@markdown Sample this many images. | |
n_images = 1#@param {type:"integer"} | |
weight = 5 | |
#@markdown Set to 0 for deterministic (DDIM) sampling, 1 (the default) for stochastic (DDPM) sampling, and in between to interpolate between the two. 0 is preferred for low numbers of timesteps. | |
eta = 1#@param {type:"number"} | |
#@markdown The random seed. Change this to sample different images. -1 means completely random seed! | |
seed = -1#@param {type:"integer"} | |
#@markdown Display progress every this many timesteps. | |
display_every = 5 #@param {type:"integer"} | |
save_progress_video = True #@param {type:"boolean"} |

Now the full setup is ready now lets run the generator:
#@title Run the Generator | |
from PIL import ImageFile, Image | |
import numpy as np | |
import os | |
target_embed = clip_model.encode_text(clip.tokenize(prompt)).float().cuda() | |
from IPython.display import clear_output | |
def cfg_model_fn(x, t): | |
"""The CFG wrapper function.""" | |
n = x.shape[0] | |
x_in = x.repeat([2, 1, 1, 1]) | |
t_in = t.repeat([2]) | |
clip_embed_repeat = target_embed.repeat([n, 1]) | |
clip_embed_in = torch.cat([torch.zeros_like(clip_embed_repeat), clip_embed_repeat]) | |
v_uncond, v_cond = model(x_in, t_in, clip_embed_in).chunk(2, dim=0) | |
v = v_uncond + (v_cond - v_uncond) * weight | |
return v | |
save_name = 0.00000000 | |
def display_callback(info): | |
global save_name | |
save_name += 0.00000001 | |
nrow = math.ceil(info['pred'].shape[0]**0.5) | |
grid = tv_utils.make_grid(info['pred'], nrow, padding=0) | |
utils.to_pil_image(grid).save(f"/content/steps/%.8f.png" % save_name) | |
if info['i'] % display_every == 0: | |
nrow = math.ceil(info['pred'].shape[0]**0.5) | |
grid = tv_utils.make_grid(info['pred'], nrow, padding=0) | |
tqdm.write(f'Step {info["i"]} of {steps}:') | |
display.display(utils.to_pil_image(grid)) | |
# utils.to_pil_image(grid).save("./steps/Test.png") | |
tqdm.write(f'') | |
def run(seed): | |
print("Prompt is: " + prompt) | |
if seed == -1: | |
seed = random.randint(0, 2**32) | |
# print(seed) | |
print("Seed is: " + str(seed)) | |
gc.collect() | |
torch.cuda.empty_cache() | |
torch.manual_seed(seed) | |
x = torch.randn([n_images, 3, side_y, side_x], device='cuda') | |
t = torch.linspace(1, 0, steps + 1, device='cuda')[:-1] | |
step_list = utils.get_spliced_ddpm_cosine_schedule(t) | |
outs = sampling.sample(cfg_model_fn, x, step_list, eta, {}, callback=display_callback) | |
tqdm.write('Done!') | |
for i, out in enumerate(outs): | |
filename = f'out_{i}.png' | |
utils.to_pil_image(out).save(filename) | |
display.display(display.Image(filename)) | |
run(seed) | |
frames = [] | |
files = [] | |
init_frame = 1 #Este es el frame donde el vídeo empezará | |
last_frame = steps #Puedes cambiar i a el número del último frame que quieres generar. It will raise an error if that number of frames does not exist. | |
directory = '/content/steps/' | |
# iterate over files in | |
# that directory | |
for filename in os.listdir(directory): | |
f = os.path.join(directory, filename) | |
# checking if it is a file | |
files.append(f) | |
for i in range(init_frame,last_frame): # | |
# save_name2 += 0.00000001 | |
# filename = f"steps/{save_name2}.png" | |
# print(files[i]) | |
frames.append(Image.open(files[i])) | |
frames[-1].save("finalgrid.png") | |
if save_progress_video: | |
init_frame = 1 #Este es el frame donde el vídeo empezará | |
last_frame = steps #Puedes cambiar i a el número del último frame que quieres generar. It will raise an error if that number of frames does not exist. | |
min_fps = 10 | |
max_fps = 30 | |
total_frames = last_frame-init_frame | |
length = 15 #Tiempo deseado del vídeo en segundos | |
# import required module | |
import os | |
# assign directory | |
directory = '/content/steps/' | |
files = [] | |
# iterate over files in | |
# that directory | |
for filename in os.listdir(directory): | |
f = os.path.join(directory, filename) | |
# checking if it is a file | |
files.append(f) | |
files.sort() | |
# print(files) | |
frames = [] | |
save_name2 = 0.00000000 | |
tqdm.write('Generating video...') | |
for i in range(init_frame,last_frame): # | |
# save_name2 += 0.00000001 | |
# filename = f"steps/{save_name2}.png" | |
# print(files[i]) | |
frames.append(Image.open(files[i])) | |
# frames[-1].save("finalgrid.png") | |
#fps = last_frame/10 | |
fps = np.clip(total_frames/length,min_fps,max_fps) | |
from subprocess import Popen, PIPE | |
p = Popen(['ffmpeg', '-y', '-f', 'image2pipe', '-vcodec', 'png', '-r', str(fps), '-i', '-', '-vcodec', 'libx264', '-r', str(fps), '-pix_fmt', 'yuv420p', '-crf', '17', '-preset', 'veryslow', 'video.mp4'], stdin=PIPE) | |
for im in tqdm(frames): | |
im.save(p.stdin, 'PNG') | |
p.stdin.close() | |
p.wait() | |
print("Video finished Rendering") |
Lets add some markdown to see the generated video on Colab:
#@markdown Show Generated Video in Colab | |
from base64 import b64encode | |
mp4 = open('video.mp4','rb').read() | |
data_url = "data:video/mp4;base64," + b64encode(mp4).decode() | |
display.HTML(""" | |
<video width=400 controls> | |
<source src="%s" type="video/mp4"> | |
</video> | |
""" % data_url) |
Now the code is done!!!
Let's generate some art!
First enter the parameters:

Now lets wait for the generation to complete:



Ladies and Gentlemen I present to you “The Tower Of AI”

Its so beautiful 🤩
Lets see the video of it generating:


Lets see it:

Lets see more Art:



Well that concludes my AI is better than me at drawing and I am super happy about it. Hehe 😁
If you like this project a follow on medium is appreciated 😁.
If you wish to buy any of the four featured Art’s there NFT links are below each photo just click there name they are available for just $25.74 or 0.01 eth. It will help me out for my college funds and help make better articles.
Check out my best friends Art (now available as a NFT) that inspired this whole article: Here
I made them NFTs, go check it out: Here
Made an AI to make Artistic Animations check it out too:
So, two weeks ago I decided to make an AI to make Art and it was working beautifully then I was like why not make one…medium.com
For the Colab Notebook go here:
For more stuff check out my github:
You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…github.com
For my day to day AIML research Updates follow me on twitter:
Thanks for reading😁, See ya guys next week 👋🏼.