Generative AI Art Tools Explained: Dall-E 2 and Stable Diffusion Comparison

Generative AI tools like Dall-E, Stable Diffusion, or ChatGPT have taken the internet by storm. From technocrats to amateurs, everyone jumped into playing with these tools, and everyone has their fair share of opinions about them.

As an AI tool producer ourselves, we want to share our thoughts on the text-to-image generative AI art tools and discuss different details of the technology.

What is generative AI art?

Generative AI art refers to artwork that is created or composed by a computer program using artificial intelligence techniques. This can include a variety of different forms, such as digital images, videos, 3D models, and even music.

The key characteristic of generative AI art is that the computer program uses algorithms and statistical models to generate the artwork rather than being solely the product of human creativity. In this way, generative AI art can be thought of as a collaboration between the artist and the machine, as the artist provides the inputs and parameters for the algorithm, and the computer generates the final output.

How does generative AI art works?

Generative AI art tools work by using algorithms and statistical models to create artwork that is similar to a dataset of training examples. These tools typically involve several different steps:

Data Collection

The first step is to collect a dataset of training examples that will be used to train the algorithm. This dataset can be composed of images, videos, audio recordings, or other forms of media that are relevant to the desired type of artwork.

Model Training

Once the dataset has been collected, a machine-learning model is trained on it. The model learns patterns and structures from the training data that it can later use to generate new, similar examples. The model can be supervised or unsupervised.

Algorithm Design

Based on the model, the algorithm is designed that can generate new data based on the learned patterns and structures. The algorithm takes input parameters, such as the size and shape of the output image, and uses the trained model to generate new artwork.

Input

once the algorithm is set, the artist can use it to input parameters such as the style, color, or theme of the output image and use the algorithm to generate new images according to the input parameters.

Output

The algorithm generates new images, music, text, or other forms of media as the final output, which the artist can then use as a starting point for further editing or manipulation.

It is important to note that Generative AI art tools also require a deep understanding of the algorithms and models to be used, as well as the creative aspect of art direction, as it’s a combination of technology and art.

What is the diffusion model in generative AI art?

The technology the popular text-to-image generative AI art systems, such as the Dall-E and Stable diffusion, uses is called the Diffusion model.

In generative AI art, diffusion models refer to a class of algorithms that generate new artwork by iteratively refining an initial random input. The idea behind these models is to start with a random input, such as an image filled with random noise, and then repeatedly apply a set of transformation rules, such as blurring, cropping, or color adjustments, to create new outputs that are similar to the training examples.

The most common types of diffusion models are called Cellular Automata, which are simple algorithms that are based on a set of rules that dictate how the cells in a grid-based image should be updated based on the state of their neighboring cells. For example, a rule could state that a cell should be turned on if a certain number of its neighboring cells are also turned on.

The key idea behind diffusion models is that by iteratively applying these rules, an initial random input will be transformed into a final output that looks similar to the training examples. This is done by adjusting the parameters of the model, such as the number of iterations or the specific set of transformation rules, to control the generated images.

The output of diffusion models can be highly dependent on the initial random input, meaning that different runs of the same algorithm can produce vastly different results. This makes them interesting for generative art use cases, as it allows the exploration of a wide range of variations of the same theme or style.

What is Dall-E and Stable Diffusion?

Let’s start with Dall-E or Dall-E 2.

DALL-E is an artificial intelligence program developed by OpenAI that makes pictures based on words. It does this by applying the GPT-3 transformer model, which has been trained with over 10 billion different sets of parameters, to the text of the user’s natural language input.

DALL·E 2 is the latest version of Dall-E that can create realistic images and art from a description in natural language.

Stable Diffusion is a text-to-image generative AI art tool that employs a frozen CLIP ViT-L/14 text encoder in order to fine-tune the model based on input from text prompts. At runtime, it partitions the imaging process into a “diffusion” process. It begins with only noise and gradually improves the image until it is completely free of noise, gradually approaching the text description that was provided.

How do the diffusion model work for Dall-E and Stable diffusion?

DALL-E uses a diffusion-based method to generate images from text. The method starts with a random noise image and then iteratively applies a set of learned transformations to refine the image until it is similar to the text-based description or the example image.

In DALL-E, the diffusion process can be thought of as a series of refinements to the random noise input, where each refinement is guided by the learned image-text representations from the training data. The model generates new images by iteratively refining the random noise input until it reaches an image that is similar to the given text prompt or example image.

Stable diffusion is a variation of the diffusion process. It aims to overcome the problem of producing different outputs from the same input and model by ensuring that the iteration process of the diffusion model terminates at a stable point. Meaning that the output generated by the final iteration is the same as the output generated by any previous iteration.

Stable diffusion works by modifying the transformation function of the diffusion model so that the output of the final iteration is guaranteed to be the same as the output of any previous iteration, regardless of the initial random input.

This is achieved by adding a stability constraint to the training objective, which forces the output of the final iteration to be the same as the output of any previous iteration, regardless of the initial random input. The stability constraint is imposed by adding a penalty term to the loss function that measures the dissimilarity between the output of the final iteration and the output of any previous iteration. This encourages the diffusion process to converge to a stable point.

In DALL-E, the stable diffusion may help produce more consistent output for the same input, but it is not the only solution for this problem. Other techniques, such as ensembling, averaging, and randomization, have been proposed to enhance the consistency of the generative models.

How to use Dall-E 2 and Stable Diffusion?

Let’s practically check how these platforms work.

Dall-E 2

First things first, visit their website and log in or register using your email.

The UI window is pretty simple. It will give you a text field to type in your words and also an additional upload button to upload your image.

Stable Diffusion

Since it is free and open source, Stable Diffusion can be implemented in a number of ways. You can find the official resource files here. It allows users to integrate the tool into various systems.

You can test Stable Diffusion from a number of websites online for free. Some notable names are-

It is worth mentioning that the results Stable Diffusion returns depend highly on the prompts. Therefore, you need to practice and fine-tune your prompts to achieve better results from the tool.

How to use prompts to get better results in Stable Diffusion?

A user can use prompts to guide the training of a model and improve the results of Stable Diffusion simulations in several ways. Here are a few examples of how a user can use prompts to achieve better results:

Use problem-specific prompts

For example, if the user is trying to simulate a Stable Diffusion process in a specific material, they can provide the model with prompts that describe the material’s properties and the conditions under which the diffusion takes place. This can help to improve the accuracy of the simulation.

Use prompts that encourage the model to learn the underlying physics

For example, the user can provide the model with prompts that describe the governing equations of the diffusion process. This can help the model to learn the underlying physics of the problem and improve the accuracy of the simulation.

Use prompts that encourage generalization

For example, the user can provide the model with prompts that describe variations in the problem, such as different boundary conditions. This can help the model to learn to generalize and make more accurate predictions for different types of problems.

Use interactive training

Where the user can provide feedback to the model, during the training process, this feedback will help the model to learn from and improve its performance.

Provide the model with diverse and representative data

The user can improve the accuracy of the Stable Diffusion simulations by providing the model with a large and diverse set of data that is representative of the types of problems that the model will be used to solve.

It is worth noting that the right prompt will vary depending on the specific task and the model you’re using. Therefore, it’s important for the user to experiment with different prompts and analyze the results in order to find the best approach for the task at hand.

You can get prompt ideas or inspiration from sites Open Art or Public Prompts.

Dall-E 2 vs Stable Diffusion: Image result comparison

We will instruct both the engines with the same word instructions and check what they return.

Text input A futuristic architecture building.

Text input A high-tech solarpunk utopia in the Amazon rainforest.

Text input Highly detailed futuristic Apple iGlass computer glasses on face of human, cyberpunk, hand tracking, concept art, character art, studio lightning, bright colors, intricate, masterpiece, photorealistic, hyperrealistic, sharp focus, high contrast, Artstation HQ, DeviantArt trending, 8k UHD, Unreal Engine 5.

Text input Ukrainian girl with blue and yellow clothes near big ruined plane, concept art, trending on artstation, highly detailed, intricate, sharp focus, digital art, 8 k.

Text inputA girl reading a book in a coffee shop, steamed coffee is in the table, high quality, realistic.

Text inputA realistic photo of a beautiful landscape.

Text input pizza party at a theme park, light dust, magnificent, close up, details, sharp focus, elegant, highly detailed, illustration, by Jordan Grimmer and greg rutkowski and PiNe(パイネ) and 薯子Imoko and 香川悠作 and wlop and maya takamura, intricate, beautiful, Trending artstation, pixiv, digital Art.

Text InputPhotograph of a gorilla in the street.

Text InputA concept art of a cyberpunk car.

Is generative AI art tools a threat to creatives?

Generative AI art tools are not a direct threat to human artists, but they do have the potential to change the way art is created and consumed.

In some ways, the use of generative AI art tools can be seen as an aid to human creativity, allowing artists to explore new possibilities and create new forms of art more efficiently. However, it is important to keep in mind that the use of generative AI art tools also requires a deep understanding of the underlying technology, which can be a barrier for some artists.

Additionally, some argue that generative AI art tools may result in a reduction of art jobs, as the automation of certain tasks, and the ability to generate large numbers of similar images or videos, might not require as many human artists.

It’s also worth noting that AI art tools are not going to replace human artists, as creating art also requires human intuition, personal touch, and emotional connection, which AI is still not capable of. There are some use cases for AI-generated art, but it will always be a supplement to the human-made art, rather than a replacement.

Overall, generative AI art tools are a powerful new tool for artists, but it’s important to keep in mind that they are just a tool, and the quality of the final output will still depend on the creativity and skills of the artist using them. The best way to approach generative AI art tools is as a way to complement and enhance human creativity, not as a replacement.