Deploy a machine learning model in seconds
Don’t want to deal with Docker, Flask, and web hosting? Try this instead
Several years ago, when I built my first machine learning model to classify handwritten digits, I immediately wanted to show it off to my friends and my siblings so that they could see what I had built. I remember searching “how to deploy a machine learning model” and becoming frustrated with all of the steps required, from containerization with Docker to purchasing an Amazon web server to host the model. What I thought would be a fast process took me several days of development and a lot of debugging.
Now, many years later into my PhD, I’ve trained hundreds of machine learning models to do all sorts of things, like classifying ultrasound images and figuring out which genes are responsible for cancer. But I still dread needing to deploy my machine learning models, because like this xkcd alludes, it’s just so time consuming and often requires ongoing maintenance:
But in this article, I want to share a simple trick to deploy a machine learning model that requires only Python, 3 lines of code, and is completely free!
Enter… the Gradio library
The trick is to use the open-source Gradio Python library, which was developed to create web demos from machine learning models. If you visit the Gradio website, you’ll see examples of demos that you can create, like this one below for a model that counts the number of people in a crowd.
But we won’t be focusing on the demos in this article. What I’ll instead be talking about is a (relatively) secret feature of the library called share
.
This feature allows you to deploy a model for a certain time period, allowing anyone to use it and try it out. And you can do it directly from a Python terminal or notebook running locally or on a server. Here’s what it looks like:
Let me break down what’s happening in the gif into 3 steps…
Step 1: Load the model & define your inference function
Gradio is not restricted to any specific machine learning framework. Instead, it works with general Python functions. So the first step is to create a function that wraps around your model inference — the function should take an input and return the prediction (the prediction can be a number, some text, an image, a dictionary of labels/confidences, or something else, depending on the type of your model)
Here’s a quick example for an image classification model in TensorFlow (though you can also use PyTorch, scikit-learn, or any other framework):
Notice that we’ve loaded the model and created a wrapper function called classify_image()
which takes in an image (in the form of an array) and passes it through the model to get a prediction.
Step 2: Create a Gradio interface object
To use Gradio, you have to create a simple GUI around your model, consisting of an input component and an output component. There’s a lot of different options (https://gradio.app/docs), so you can choose whichever one makes sense for your model. In this case, we’ll use the Image
input and Label
output. Then, we pass in the function we defined and the input components we chose into the Interface()
class, like so:
Step 3: Call the launch()
method with share=True
The final step is to call launch(share=True)
. That’s just:
…And that’s it! When you run this, you’ll see that a public link to your model has been magically created:
Awesome! You can now send the link to your friends, collaborators, or colleagues for them to use and run your model directly from their browsers.
To summarize:
- We’ve shown how to use the Gradio library to magically deploy a model in seconds (full code in this Colab notebook)
- The model will be deployed for a certain time period, although you can just rerun the code to deploy your model again!
- Get started with Gradio here: https://github.com/gradio-app/gradio