Hardcore AI Hurdles: Creating my First Stable Diffusion LoRA

January 4, 2024

Magik Chance

Hardcore AI Hurdles: Creating my First Stable Diffusion LoRA

Since the last week of the year, I have been attempting to learn my way not only around Stable Diffusion, but the training and application of LoRAs (because apparently I can’t do anything easy.) The good news is, you get to learn from my experience, so that hopefully you have a smoother ride.

Pinokio

Most open source AI apps are on GitHub, but to be honest, GitHub has always intimidated me. Sure, I’ve used it in the past, but I never seem to be able to figure out where to download what I need. It’s confusing! And Command Prompt makes me nervous.

Luckily I have Pinokio to do most of that for me.

I wrote previously about Pinokio but here’s the gist: it’s a browser designed to handle your apps, in particular your AI apps. You just get it and then click the button for whatever you want to download and it handles it all.

Kohya_SS

I love my style GPTs but ChatGPT is a pain with its rate limits, so I wanted to figure out if I could take the images I’ve generated in a style and create a style LoRA that I could use any time.

Kohya_SS specializes in training LoRAs and is the app I decided to try. I found a great tutorial on YouTube by CreatixAi that walked me through the process. This cheat sheet for understanding the different parameters was also very helpful.

I should emphasize that if you don’t have the hardware, you don’t have to run Stable Diffusion or train LoRAs locally, but you will likely have to pay for the GPU hours. If that’s the route you want to go, you might start by checking out Google Colab, ThinkDiffusion, or Civit AI.

My computer has a RTX 3060 (12gb) – far from the best and probably the minimum you want for training LoRAs. Be ready to let it set and work overnight.

I collected my training images for the Dark Fantasy style, ran them through a captioner (then quickly edited them), and tried a test run of 1 epoch, or single version of the LoRA.

…and promptly told it to stop.

The good news was my graphics card was performing fine, but I had put together 50 images and it would have taken over 2 hours to train a single epoch. This was a little much for me.

My basic basic understanding for how long things will take is this:

([Number of Images Number of Repeats] / Training Batch Size) Number of Epochs

There are other factors but that seems to be the main one.

I refined my images down to 30 and gave it another go. This time, the epoch took about 1 hour.

I set my computer up to train 10 epochs overnight…

ComfyUI & Automatic1111

After my first epoch trial, I needed somewhere to actually test the LoRA. There are two main UI’s I came across: ComfyUI and Automatic1111. Now, after trying them both, I would recommend Automatic1111 because it’s much simpler, but I happened to try ComfyUI first.

ComfyUI works using nodes. This setup is confusing if you’ve never worked with nodes before, but it’s great if you want to set up a long, complex process. I originally chose this because I thought it might make using more than one LoRA at a time for an image easier, but it turns out that is easily done with Automatic1111 too.

I did all my LoRA testing on ComfyUI, but have now switched to Automatic1111 because of its much simpler design.

Results

Here are the results of the test of my Dark Fantasy LoRAs. I rated them based on prompt accuracy, correctness (correct faces, fingers, etc), how closely it matched the style I wanted, and how much I overall liked the image, then averaged the four scores.

Am I happy with any of the LoRAs? Not quite yet. I think it would have worked better if I focused on a certain type of subject as well as style – for instance, people, animals, architecture.

Here are some other things I learned:

With my hardware, generating an image on Stable Diffusion takes about the same or less time as Dall-e 3 through ChatGPT.
Write down settings! This is important for both learning what they do and being able to replicate results later on.
Make use of file names. If it’s short enough, you can record the prompt, seed, and perhaps some other info within a file name.
Make use of txt files. If you use a long prompt or many complicated settings, consider creating a txt file with that information in the same place and with the same name as the image.

Conclusion

If you are thinking about using Stable Diffusion and/or training your own LoRAs, then I hope you were able to glean something from my experience.

Do you use Stable Diffusion? Please head over to this X thread and leave a comment!