Making an AI to make beautiful Art using GPT-3, CLIP and the CC12M Dataset.

So, my best friend is very talented in drawing and making art in over all. So, last time we were talking about a competition he had. He had…

Making an AI to make beautiful Art using GPT-3, CLIP and the CC12M Dataset.

So, my best friend is very talented in drawing and making art in over all. So, last time we were talking about a competition he had. He had to make two drawings for the topic utopia and dystrophia. Then I suddenly thought lets make an AI for this and as result here we are.

I started doing some research and found out openAI had an model named DALL E that can images based on text. So, I read the research paper and found out it works with something called GPT-3 and CLIP.

GPT-3 is one of the most advanced language processing model ever made.

CLIP is one of the most advanced language to image synthesis model ever made

Now I had the basics on how I will Design my AI but I needed more reference and then while researching I found out about “rivers have wings” a very talented coder who makes amazing models for art genration. I saw her using the CC12M dataset and with that I had everything ready and my research was a success.

So, Let's get started.

This project was made on google colab so first enable GPU in your runtime.

Then check your GPU:

# Check the GPU
!mkdir steps
!nvidia-smi

Now lets start installing the dependencies:

# Install dependencies
!pip install ftfy regex requests tqdm
!git clone --recursive https://github.com/crowsonkb/v-diffusion-pytorch

Lets download the diffusion model:

# Download the diffusion model
# SHA-256: 4fc95ee1b3205a3f7422a07746383776e1dbc367eaf06a5b658ad351e77b7bda
!mkdir v-diffusion-pytorch/checkpoints
!curl -L https://v-diffusion.s3.us-west-2.amazonaws.com/cc12m_1_cfg.pth > v-diffusion-pytorch/checkpoints/cc12m_1_cfg.pth

Now for the imports:

# Imports
import gc
import math
import sys
from IPython import display
import torch
from torchvision import utils as tv_utils
from torchvision.transforms import functional as TF
from tqdm.notebook import trange, tqdm
sys.path.append('/content/v-diffusion-pytorch')
from CLIP import clip
from diffusion import get_model, sampling, utils

Getting the data files of the diffusion model:

%pip install -q requests tqdm ftfy
%cd /content
!git clone --recursive https://github.com/crowsonkb/v-diffusion-pytorch.git
%cd /content/v-diffusion-pytorch
%mkdir -p checkpoints
!curl https://v-diffusion.s3.us-west-2.amazonaws.com/cc12m_1_cfg.pth -o checkpoints/cc12m_1_cfg.pth
!./cfg_sample.py "the rise of consciousness":5 -n 4 -bs 4 --seed 0

Lets start loading the models now:

# Load the models
model = get_model('cc12m_1_cfg')()
_, side_y, side_x = model.shape
model.load_state_dict(torch.load('/content/v-diffusion-pytorch/checkpoints/cc12m_1_cfg.pth', map_location='cpu'))
model = model.half().cuda().eval().requires_grad_(False)
clip_model = clip.load(model.clip_model, jit=False, device='cpu')[0]

Now lets take the inputs from the user:

#Only these heights and widths work: 4,8,16,32,64,128,256,512
#@title Settings
import random
prompt = 'Tower of AI, Digital Art' #@param {type:"string"}
height = 512#@param {type:"integer"}
width = 512#@param {type:"integer"}
side_x = width
side_y = height
#@markdown Specify the number of diffusion timesteps (default is 50, can lower for faster but lower quality sampling).
steps = 1000#@param {type:"integer"}
#@markdown Sample this many images.
n_images = 1#@param {type:"integer"}
weight = 5
#@markdown Set to 0 for deterministic (DDIM) sampling, 1 (the default) for stochastic (DDPM) sampling, and in between to interpolate between the two. 0 is preferred for low numbers of timesteps.
eta = 1#@param {type:"number"}
#@markdown The random seed. Change this to sample different images. -1 means completely random seed!
seed = -1#@param {type:"integer"}
#@markdown Display progress every this many timesteps.
display_every = 5 #@param {type:"integer"}
save_progress_video = True #@param {type:"boolean"}
In Colab it would look like this

Now the full setup is ready now lets run the generator:

#@title Run the Generator
from PIL import ImageFile, Image
import numpy as np
import os
target_embed = clip_model.encode_text(clip.tokenize(prompt)).float().cuda()
from IPython.display import clear_output
def cfg_model_fn(x, t):
"""The CFG wrapper function."""
n = x.shape[0]
x_in = x.repeat([2, 1, 1, 1])
t_in = t.repeat([2])
clip_embed_repeat = target_embed.repeat([n, 1])
clip_embed_in = torch.cat([torch.zeros_like(clip_embed_repeat), clip_embed_repeat])
v_uncond, v_cond = model(x_in, t_in, clip_embed_in).chunk(2, dim=0)
v = v_uncond + (v_cond - v_uncond) * weight
return v
save_name = 0.00000000
def display_callback(info):
global save_name
save_name += 0.00000001
nrow = math.ceil(info['pred'].shape[0]**0.5)
grid = tv_utils.make_grid(info['pred'], nrow, padding=0)
utils.to_pil_image(grid).save(f"/content/steps/%.8f.png" % save_name)
if info['i'] % display_every == 0:
nrow = math.ceil(info['pred'].shape[0]**0.5)
grid = tv_utils.make_grid(info['pred'], nrow, padding=0)
tqdm.write(f'Step {info["i"]} of {steps}:')
display.display(utils.to_pil_image(grid))
# utils.to_pil_image(grid).save("./steps/Test.png")
tqdm.write(f'')
def run(seed):
print("Prompt is: " + prompt)
if seed == -1:
seed = random.randint(0, 2**32)
# print(seed)
print("Seed is: " + str(seed))
gc.collect()
torch.cuda.empty_cache()
torch.manual_seed(seed)
x = torch.randn([n_images, 3, side_y, side_x], device='cuda')
t = torch.linspace(1, 0, steps + 1, device='cuda')[:-1]
step_list = utils.get_spliced_ddpm_cosine_schedule(t)
outs = sampling.sample(cfg_model_fn, x, step_list, eta, {}, callback=display_callback)
tqdm.write('Done!')
for i, out in enumerate(outs):
filename = f'out_{i}.png'
utils.to_pil_image(out).save(filename)
display.display(display.Image(filename))
run(seed)
frames = []
files = []
init_frame = 1 #Este es el frame donde el vídeo empezará
last_frame = steps #Puedes cambiar i a el número del último frame que quieres generar. It will raise an error if that number of frames does not exist.
directory = '/content/steps/'
# iterate over files in
# that directory
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
files.append(f)
for i in range(init_frame,last_frame): #
# save_name2 += 0.00000001
# filename = f"steps/{save_name2}.png"
# print(files[i])
frames.append(Image.open(files[i]))
frames[-1].save("finalgrid.png")
if save_progress_video:
init_frame = 1 #Este es el frame donde el vídeo empezará
last_frame = steps #Puedes cambiar i a el número del último frame que quieres generar. It will raise an error if that number of frames does not exist.
min_fps = 10
max_fps = 30
total_frames = last_frame-init_frame
length = 15 #Tiempo deseado del vídeo en segundos
# import required module
import os
# assign directory
directory = '/content/steps/'
files = []
# iterate over files in
# that directory
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
files.append(f)
files.sort()
# print(files)
frames = []
save_name2 = 0.00000000
tqdm.write('Generating video...')
for i in range(init_frame,last_frame): #
# save_name2 += 0.00000001
# filename = f"steps/{save_name2}.png"
# print(files[i])
frames.append(Image.open(files[i]))
# frames[-1].save("finalgrid.png")
#fps = last_frame/10
fps = np.clip(total_frames/length,min_fps,max_fps)
from subprocess import Popen, PIPE
p = Popen(['ffmpeg', '-y', '-f', 'image2pipe', '-vcodec', 'png', '-r', str(fps), '-i', '-', '-vcodec', 'libx264', '-r', str(fps), '-pix_fmt', 'yuv420p', '-crf', '17', '-preset', 'veryslow', 'video.mp4'], stdin=PIPE)
for im in tqdm(frames):
im.save(p.stdin, 'PNG')
p.stdin.close()
p.wait()
print("Video finished Rendering")
Running the program will show you a bar of progress and make a video of the whole making of Art porcess

Lets add some markdown to see the generated video on Colab:

#@markdown Show Generated Video in Colab
from base64 import b64encode
mp4 = open('video.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
display.HTML("""
<video width=400 controls>
<source src="%s" type="video/mp4">
</video>
""" % data_url)

Now the code is done!!!

Let's generate some art!

First enter the parameters:

you can turn down the steps according to the resolution you want

Now lets wait for the generation to complete:

Gotta wait 1hr 50 minutes I guess.
Already liking the generation its soooo good!!
Its Done!!!!!

Ladies and Gentlemen I present to you “The Tower Of AI”

“The The Tower of AI”

Its so beautiful 🤩

Lets see the video of it generating:

The Video is rendering
The video is rendered!!!

Lets see it:

The Great artist at its work

Lets see more Art:

“The Physical Form Of Imagination”
“The Castle of Seas”
“Cyber Dystopia”

Well that concludes my AI is better than me at drawing and I am super happy about it. Hehe 😁

If you like this project a follow on medium is appreciated 😁.

If you wish to buy any of the four featured Art’s there NFT links are below each photo just click there name they are available for just $25.74 or 0.01 eth. It will help me out for my college funds and help make better articles.

Check out my best friends Art (now available as a NFT) that inspired this whole article: Here

I made them NFTs, go check it out: Here

Made an AI to make Artistic Animations check it out too:

For the Colab Notebook go here:

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

For more stuff check out my github:

For my day to day AIML research Updates follow me on twitter:

Thanks for reading😁, See ya guys next week 👋🏼.