Deploying a Pretrained Stable Diffusion Model in AWS Lambda
A setup guide to deploy your AI-art generator models
In this article, I’ll describe how to get a stable diffusion (neural network) model deployed on AWS Lambda, using a pretrained model as the base, specifically one with weights and the inference code available beforehand.
The base code used Openvino to produce a highly CPU-optimized version of the Stable Diffusion.
This framework is particularly useful for Edge and Internet of Things use cases, but here, we’ll use something completely different — we’ll deploy a really big model (~3 GB and execute it successfully on AWS Lambda).
I will not teach anything about openvino because I am unfamiliar with it. In fact, I pretty much just used an open source repository as the base.
I will instead focus on getting the glue to get it deployed on AWS Lambda.
The described approach should work regardless of the underlying framework (hugging face, PyTorch, etc.), so it’s pretty useful to know it if you want to perform inference over a serverless HTTP endpoint.
This small story is divided into:
- A brief background
- The step-by-step guide to deploying it on AWS Lambda
- My silly mistakes before getting it right
Some Background About Stable Diffusion
But before we dive in, some context about what we’re working with. You’ve probably heard of AI-generated images and their recent boom with Stable Diffusion. If not, and if you’re interested in the topic, you might want to check this excellent blog on the subject from Hugging Face: https://huggingface.co/blog/stable_diffusion.
Here’s a sample picture of an astronaut cat that I’ve generated with Stable Diffusion:
Executing this neural network to generate images demands a lot of GPU power, although the model is being optimized over time and should become more tractable to use with time for any end user with a modern desktop or laptop.
Because the Stable Diffusion model is open source, different people have also been working on offering optimized alternatives: optimizing it for MacBook M1 chips, optimizing it for intel chips, etc.
Typically, the time it takes depends a lot on the actual hardware. I extracted the following approximation for the compute for different types of processors (tested locally with an RTX 3090 and the i9, and only read about M1 online):
As you can see, when executing on an i9
, the Openvino solution is painfully slow when compared to the alternatives.
An alternative is to use ONNX version provided by the HuggingFace (which has a similar compute time after optimizing it with the onnx simplifier
(https://github.com/daquexian/onnx-simplifier).
Note: the simplifier requires about 27GB RAM of memory to simplify this model. I suspect the final results would be similar if I had used the ONNX version.
Despite the slowness of the CPU inference, it's interesting to see that it can execute in AWS Lambda, which means it's usable for free trials/demos because of the generous AWS Lambda free tier. I’m building something similar, e.g., a free toy product I published under https://app.openimagegenius.com.
Update 22.07.2023: As of today, I’m taking the demo website offline, as I’m not renewing the registered domain that I used for this project.
If you'd like to experiment with it, make sure to be patient and wait up to five minutes for an image generation. The Lambda only uses 12 inference steps to speed it up (it takes about 60 seconds of execution once the Lambda is warm).
The source code for this example can be found here. Feel free to use it however you like without asking my permission (it’s MIT License). Just please keep in mind the license of the models.
Enough chat. Let’s jump into the solution.
The Serverless Glue
(Working version: container-based Lambda with EFS🎉)
The Manual Part
Unfortunately, there are a lot of manual steps here. While one could automate a good part of this, I didn’t think it made sense for me to invest that much time. Someone skilled in CloudFormation or Terraform could probably automate most (if not all) of these steps.
If you try to follow these steps and get stuck, don’t hesitate to reach me, I’d be happy to help.
Let’s start with the VPC and EC2
- Create a VPC or use the default one. Either way is fine.
- Create an EC2 instance connected to this VPC and preferably deploy it to a public subnet. You’ll need to connect to the instance through SSH, which is much easier when you use a public subnet.
- Create a new security subgroup or modify the one you use. You’ll need ports 22 and 2049 open in the ingress.
Now the EFS
- Create an EFS. Make sure it’s available in the same subnet from the EC2 instance the Setup and uses the same security group you defined.
- SSH into your EC2 instance.
- Follow the AWS guide on how to mount the EFS in your EC2: https://docs.aws.amazon.com/efs/latest/ug/wt1-test.html
- In summary, you execute the following commands (modify the parameters appropriately, with the actual mount folder and mount-target-DNS extracted from the AWS Console / CLI — you can find the DNS in the EFS File System UI screen)
mkdir mnt-folder
sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport mount-target-DNS:/ ~/mnt-folder
- Save the openvino model files to the EFS. In my case, I manually downloaded them using this code (https://github.com/bes-dev/stable_diffusion.openvino/blob/master/stable_diffusion_engine.py) and previously uploaded them to an S3 bucket. Then in my EC2 instance, I downloaded from the S3 bucket into the EFS
(Note: to do this, you might need to assign a role to your EC2, as shown in the screenshot below.)
Once you’ve configured the role correctly, the aws-cli should work, e.g., you can execute commands like aws3 sync s3://your-bucket
.
Then create an access point for the elastic file system
Create an EFS access point for the Elastic File System you’ve configured.
It’s important to pay attention to a few things here:
The filesystem user permissions — if they are too restrictive, you’ll get PermissionError
when accessing the EFS files from your Lambda. In my case, this EFS was dedicated to this Lambda, so I didn’t care about granularity and just gave wide-open access (I’ll do the same in the serverless file later on):
Also, avoid assigning /
to the root directory path that you define for the Access Point, it may cause problems when mounting. Also, make sure to note down the value you pick. You’ll need to use it inside your Lambda function. I’ve personally used: /mnt/fs
, as it was described in another guide.
OK, we’re pretty much done creating resources manually.
Serverless Framework
Most of the heavy lifting with the serverless template regarding the EFS parts I gathered from https://medium.com/swlh/mount-your-aws-efs-volume-into-aws-lambda-with-the-serverless-framework-470b1c6b1b2d.
Here’s the full template. You pretty much have to replace the resource IDs for your own.
service: stable-diffusion-open-vino
frameworkVersion: "3"
provider:
name: aws
runtime: python3.8
stage: ${opt:stage}
region: eu-central-1
memorySize: 10240
iam:
role:
statements:
- Effect: Allow
Action:
- "elasticfilesystem:*"
Resource:
- "arn:aws:elasticfilesystem:${aws:region}:${aws:accountId}:file-system/${self:custom.fileSystemId}"
- "arn:aws:elasticfilesystem:${aws:region}:${aws:accountId}:access-point/${self:custom.efsAccessPoint}"
functions:
textToImg:
url: true
image:
name: appimage
timeout: 300
environment:
MNT_DIR: ${self:custom.LocalMountPath}
vpc:
securityGroupIds:
- ${self:custom.securityGroup}
subnetIds:
- ${self:custom.subnetsId.subnet0}
custom:
efsAccessPoint: YOUR_ACCESS_POINT_ID
fileSystemId: YOUR_FS_ID
LocalMountPath: /mnt/fs
subnetsId:
subnet0: YOUR_SUBNET_ID
securityGroup: YOUR_SECURITY_GROUP
resources:
extensions:
TextToImgLambdaFunction:
Properties:
FileSystemConfigs:
- Arn: "arn:aws:elasticfilesystem:${self:provider.region}:${aws:accountId}:access-point/${self:custom.efsAccessPoint}"
LocalMountPath: "${self:custom.LocalMountPath}"
A few parts worth mentioning:
The memory size: You won’t have access to 10GB of memory by default. You need to open a ticket with AWS to support this use case. Note that you'll not find a specific case for this request. I requested an increase in Lambda storage and explained that I needed more memory. It took a couple of days to get it accepted by AWS.
memorySize: 10240
Function URL: this line url: true
enables a public URL to invoke your function, mostly just for developing/debugging purposes.
The docker container build mode
provider:
...
ecr:
images:
appimage:
path: ./...functions:
textToImg:
url: true
image:
name: appimage
timeout: 300
The serverless framework does a lot for you here: these blocks alone will:
- create a private ECR repository
- use a local Dockerfile to build your container
- tag the image
- push it to the private ECR repository
- create a Lambda function that uses the docker image you’ve just created
That being said, be prepared. Our build/deployment will take considerable time compared to native AWS Lambda.
Here’s the Dockerfile that I’ve used:
FROM python:3.9.9-bullseyeWORKDIR /srcRUN apt-get update && \\
apt-get install -y \\
libgl1 libglib2.0-0 \\
g++ \\
make \\
cmake \\
unzip \\
libcurl4-openssl-devCOPY requirements.txt /src/
RUN pip3 install -r requirements.txt --target /src/
COPY handler.py stable_diffusion_engine.py /src/ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "handler.handler" ]
It installs the AWS Lambda runtime interface and the dependencies we need to execute stable-diffusion (openvino version).
Again, acknowledgments are owned by the original author of the openvino solution: https://github.com/bes-dev/stable_diffusion.openvino.
Before you build the docker image, you’ll need to adapt the stable_diffusion_engine
module and get the Lambda handler. You can either pull them from my repository or adapt the originals from the repository above.
The main changes required to the engine are regarding how to load the models:
self.tokenizer = CLIPTokenizer.from_pretrained(
"/mnt/fs/models/clip") (...)
self._text_encoder = self.core.read_model(
"/mnt/fs/models/text_encoder/text_encoder.xml") (...)
self._unet = self.core.read_model(
"/mnt/fs/models/unet/unet.xml") (...)
self._vae_decoder = self.core.read_model(
"/mnt/fs/models/vae_decoder/vae_decoder.xml") (...)
self._vae_encoder = self.core.read_model(
"/mnt/fs/models/vae_encoder/vae_encoder.xml")
You can then use the module in my handler (which is just an adaption from the demo.py
file from https://github.com/bes-dev/stable_diffusion.openvino):
# -- coding: utf-8 --`
print("Starting container code...")
from dataclasses import dataclass
import numpy as np
import cv2
from diffusers import LMSDiscreteScheduler, PNDMScheduler
from stable_diffusion_engine import StableDiffusionEngine
import json
import os
@dataclass
class StableDiffusionArguments:
prompt: str
num_inference_steps: int
guidance_scale: float
models_dir: str
seed: int = None
init_image: str = None
beta_start: float = 0.00085
beta_end: float = 0.012
beta_schedule: str = "scaled_linear"
model: str = "bes-dev/stable-diffusion-v1-4-openvino"
mask: str = None
strength: float = 0.5
eta: float = 0.0
tokenizer: str = "openai/clip-vit-large-patch14"
def run_sd(args: StableDiffusionArguments):
if args.seed is not None:
np.random.seed(args.seed)
if args.init_image is None:
scheduler = LMSDiscreteScheduler(
beta_start=args.beta_start,
beta_end=args.beta_end,
beta_schedule=args.beta_schedule,
tensor_format="np",
)
else:
scheduler = PNDMScheduler(
beta_start=args.beta_start,
beta_end=args.beta_end,
beta_schedule=args.beta_schedule,
skip_prk_steps=True,
tensor_format="np",
)
engine = StableDiffusionEngine(
model=args.model, scheduler=scheduler, tokenizer=args.tokenizer, models_dir=args.models_dir
)
image = engine(
prompt=args.prompt,
init_image=None if args.init_image is None else cv2.imread(
args.init_image),
mask=None if args.mask is None else cv2.imread(args.mask, 0),
strength=args.strength,
num_inference_steps=args.num_inference_steps,
guidance_scale=args.guidance_scale,
eta=args.eta,
)
is_success, im_buf_arr = cv2.imencode(".jpg", image)
if not is_success:
raise ValueError("Failed to encode image as JPG")
byte_im = im_buf_arr.tobytes()
return byte_im
def handler(event, context, models_dir=None):
print("Getting into handler, event: ", event)
print("Working dir at handler...", )
current_dir = os.getcwd()
print(current_dir)
print(os.listdir(current_dir))
print("Listing root")
print(os.listdir("/"))
# Get args
# randomizer params
body = json.loads(event.get("body"))
prompt = body["prompt"]
seed = body.get("seed")
num_inference_steps: int = int(body.get("num_inference_steps", 32))
guidance_scale: float = float(body.get("guidance_scale", 7.5))
args = StableDiffusionArguments(
prompt=prompt,
seed=seed,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
models_dir=models_dir
)
print("Parsed args:", args)
image = run_sd(args)
print("Image generated")
body = json.dumps(
{"message": "wow, no way", "image": image.decode("latin1")})
return {"statusCode": 200, "body": body}
Testing the Deployment
When you finish deploying, the serverless framework should give you an URL, which you can call like this:
curl -X POST \\
https://your_lambda_url_id.lambda-url.eu-central-1.on.aws/ \\
-H 'content-type: application/json' \\
-d '{"prompt": "tree"}'
If everything works, you should see in the Cloud Watch logs that it’s generating the following image:
When I tested, it spent almost three minutes in the main loop and 238 seconds (four minutes) to execute the full Lambda.
The curl above will give you an unreadable string with an image encoded into latin1
. If you plan to actually use your Lambda, perhaps you might want something like this instead (I used this to test my container locally, replace the URL):
import requests
import json
headers = {"content-type": "application/json"}
url = "<http://localhost:9000/2015-03-31/functions/function/invocations>"
body = json.dumps({"prompt": "beautiful tree", "num_inference_steps": 1})
response = requests.post(url, json={"body": body}, headers=headers)
response.raise_for_status()
j = response.json()
body = json.loads(j["body"])
bytes_img = body["image"].encode("latin1")
with open("test_result.png", "w+b") as fp:
fp.write(bytes_img)
Phew! That was a lot of steps! Now you can deploy your stable diffusion model to AWS Lambda. I hope you enjoyed reading this short tutorial. I’ll leave you with some things I tried — but didn’t quite work out — so maybe I can convince you not to try them.
The History of Mistakes
Are you wondering how bad the trial and error for me was? Well, I don’t mind sharing — failing is learning.
First Attempt: Full EFS, Native AWS Lambda
So, I had read many times to deploy large models on Lambda and use AWS Elastic File System. And so I did that.
I’ve configured a regular AWS Lambda and connected it to an EFS. However, when I had the code executed, I ran into an error when importing the openvino
runtime: libm.so.6 not found
.
After some head-scratching and research, I learned that AWS Lambda runs on Amazon Linux and that I maybe should build my library's dependencies directly inside the EC2 instance.
Except that, when I tried that, I found out that openvino
doesn’t have the runtime version 2022 available for Amazon Linux (https://pypi.org/project/openvino/). Uh-oh, dead end.
Second Attempt: Full Container Mode
After a few days, after discussing with a friend how much I did not miss using Docker compared to using serverless technologies, a light bulb popped up: what if I used a docker image to deploy the Lambda?
It turns out that the container image size limit is 10GB, which is pretty generous (https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/) — that has to work!
Well, not so fast. When things seemed to get close to work, I ran into some problems.
[ERROR] RuntimeError: Model file /src/models/unet/unet.xml cannot be opened!
Traceback (most recent call last):
File "/src/handler.py", line 129, in handler
image = run_sd(args)
File "/src/handler.py", line 76, in run_sd
engine = StableDiffusionEngine(
File "/src/stable_diffusion_engine.py", line 59, in __init__
self._unet = self.core.read_model(unet_xml, unet_bin)
Huh? I stared at this error for a couple of hours, debugging my environment, making sure the file was available, etc. Since, according to the OpenVINO API reference, the core.read_model
can also accept binary data directly, I changed my code a little bit and tried to load the models to a dictionary of binary buffers ahead of time.
models = {}
for model in ["text_encoder", "unet", "vae_decoder", "vae_encoder"]:
with open(f"./models/{model}/{model}.xml", "r+b") as fp:
models[f"{model}-xml"] = fp.read()
with open(f"./models/{model}/{model}.bin", "r+b") as fp:
models[f"{model}-bin"] = fp.read()
Except that, I still ran into errors, but they were more meaningful this time.
[ERROR] OSError: [Errno 30] Read-only file system: './models/text_encoder/text_encoder.xml'
Traceback (most recent call last):
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/src/handler.py", line 23, in <module>
from stable_diffusion_engine import StableDiffusionEngine
File "/src/stable_diffusion_engine.py", line 30, in <module>
with open(f"./models/{model}/{model}.xml", "r+b") as fp:
I double-checked Python documentation and realized the r+b
actually means “open for updating (reading and writing).” Maybe the file system is read-only. Let’s try again without it, using just rb
instead:
[ERROR] PermissionError: [Errno 13] Permission denied: './models/text_encoder/text_encoder.xml'
Traceback (most recent call last):
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/src/handler.py", line 23, in <module>
from stable_diffusion_engine import StableDiffusionEngine
File "/src/stable_diffusion_engine.py", line 30, in <module>
with open(f"./models/{model}/{model}.xml", "rb") as fp:
OK, maybe I need to copy the files to /tmp
first? No, that just gave me the same error. I couldn’t make sense of it — the same code worked perfectly locally, and I even tested it with the Lambda Runtime Interface Emulator. It had to be something with the environment.
AWS blocks binary reads from the container image filepath for security reasons. I never figured out why exactly. Switching to the hybrid approach, where models are stored in EFS and code dependencies/libraries are in Docker image, worked smoothly.
All right, that’s all for the day!
I hope you enjoyed reading it. Cheers!