Getting Every Microsecond Out of uWSGI
In recent articles, I covered performance tuning both HAProxy and NGINX. Today’s article will be similar, however we’re going to go further down the stack and explore tuning a Python application running via uWSGI.
What Is uWSGI
In order to deploy a web application written in Python, you would typically need two supporting components.
The first is a traditional web server such as NGINX to perform basic web server tasks such as caching, serving static content, and handling inbound connections.
The second is an application server such as uWSGI.
In this context, an application server is a service that acts as a middleware between the application and the traditional web server. The role of an application server typically includes starting the application, managing the application, as well as handling incoming connections to the application itself.
With a web-based application, this means accepting HTTP requests from the web server and routing those requests to the underlying application.
uWSGI is an application server commonly used for Python applications. However, uWSGI supports more than just Python; it supports many other types of applications, such as ones written in Ruby, Perl, PHP, or even Go. Even with all of these other options, uWSGI is mostly known for its use with Python applications, partly because Python was the first supported language for uWSGI.
Another thing uWSGI is known for is being performant; today, we’ll explore how to make it even more so by adjusting some of its many configuration options to increase throughput for a simple Python web application.
Our Simple REST API
In order to properly tune uWSGI, we first must understand the application we are tuning. In this article, that will be a simple REST API designed to return a Fibonacci sequence to those who perform an HTTP GET request.
The application itself is written in Python using the Flask web framework. This application is extremely small, and meant as a quick and dirty sample for our tuning exercise.
Let’s take a look at how it works before moving into tuning uWSGI.
app.py
:
''' Quick Fibonacci API ''' from flask import Flask import json import fib app = Flask(__name__) @app.route("/<number>", methods=['GET']) def get_fib(number): ''' Return Fibonacci JSON ''' return json.dumps(fib.get(int(number))), 200 if __name__ == '__main__': app.run(host="0.0.0.0", port="8080")
This application consists of two files. The first is app.py
, which is the main web application that handles accepting HTTP GET requests and determines what to do with them.
In the above code, we can see that app.py
is calling the fib
library to perform the actual Fibonacci calculations. This is the second file of our application. Let’s take a look at this library to get a quick understanding of how it works.
fib.py
:
''' Fibonacci calculator ''' def get(number): ''' Generate fib sequence until specified number is exceeded ''' # Seed the sequence with 0 and 1 sequence = [0, 1] while sequence[-1] < number: sequence.append(sequence[-2] + sequence[-1]) return sequence
From the above code, we can see that this function simply takes an argument of number
and generates a Fibonacci sequence up to the specified number. As previously mentioned, this application is a pretty simple REST API that does some basic calculations based on user input and returns the result.
With the application now in mind, let’s go ahead and start our tuning exercise.
Setting Up uWSGI
As with all performance-tuning exercises, it’s best to first establish a baseline performance measurement. For this article, we will be using a bare-bones setup of uWSGI as our baseline. To get started, let’s go ahead and set up that environment now.
Installing Python’s package manager
Since we are starting from scratch, we’ll need to install several packages. We’ll do this with a combination of pip
, the Python package manager, and Apt, the system package manager for Ubuntu.
In order to install pip
, we will need to install the python-pip
system package. We can do so with the apt-get
command.
# apt-get install python-pip
With pip
installed, let’s go ahead and start installing our other dependencies.
Installing Flask and uWSGI
To support our minimal application, we only need to install two packages with pip
: flask
(the web framework we use) and uwsgi
. To install these packages, we can simply call the pip
command with the install
option.
# pip install flask uwsgi
At this point, we have finished installing everything we need for a bare-bones application. Our next step is to configure uWSGI to launch our application.
Bare-bones uWSGI configuration
uWSGI has many configuration parameters. For our baseline tests, we will first set up a very basic uWSGI configuration. We’ll do this by adding the following to a new uwsgi.ini
file:
[uwsgi] http = :80 chdir = /root/fib wsgi-file = app.py callable: app
The above is essentially just enough configuration to start our web application and nothing more. Before we move into performance testing, let’s first take a second to understand what the above options mean and how they change uWSGI behaviors.
http
– HTTP bind address
The first parameter to explore is the http
option. This option is used to tell uWSGI which IP and port to bind for incoming HTTP connections. In the example above, we gave the value of :80
; this means listen on all IPs for connections to port 80
.
The http
option tells uWSGI one more thing: that this application is a web application and will be receiving requests via HTTP methods. uWSGI also supports non-HTTP-based applications by replacing the http
option with options such as socket
, ssl-socket
, and raw-socket
.
chdir
– Change running directory
The second parameter is the chdir
option which tells uWSGI to change its current directory to /root/fib
before launching the application. This option may not be required for all applications but is extremely useful if your application must run from a specified directory.
wsgi-file
– Application executable
The wsgi-file
option is used to specify the application executable to be called. In our case, this is the app.py
file.
callable
– Internal application object
Flask-based applications have an internal application object used to start the running web application. For our application, it is the app
object. When running a Flask application within uWSGI, it’s necessary to provide this object name to the callable
parameter, as uWSGI will use this object to start the application.
With our basic configuration defined, let’s test whether or not we are able to start our application.
Starting our web application
In order to start our application, we can simply execute the uwsgi
command followed by the configuration file we just created; uwsgi.ini
.
# uwsgi ./uwsgi.ini
With the above executing successfully, we should now have a running application. Let’s ago ahead and test making an HTTP request to the application using the following curl
command:
$ curl http://example.com/9000 [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946]
In the above example, we can see the output of the curl
command is a JSON list of numbers in a Fibonacci sequence. From this result, we can see that our application is running and responding to HTTP requests appropriately.
Measuring Baseline Performance
With our application up and running, we can now go ahead and run our performance test case to measure the application’s base performance.
# ab -c 500 -n 5000 -s 90 http://example.com/9000 Requests per second: 347.28 [#/sec] (mean) Time per request: 1439.748 [ms] (mean) Time per request: 2.879 [ms] (mean, across all concurrent requests)
In the above, we once again used the ab
command to send multiple web requests to our web application. Specifically, the above command is sending 5000
HTTP GET requests to our web application in batches of 500
. The results of this test show that ab
was able to send a little over 347
HTTP requests per second.
For a basic out-of-the-box configuration, this level of performance is pretty decent. We can, however, achieve better with just a little bit of tweaking.
Multithreading
One of the first things we can adjust is the number of processes that uWSGI is running. Much like our earlier exercise with HAProxy, the default configuration of uWSGI starts only one instance of our web application.
With our current application, this basically means each HTTP request must be handled by a single process. If we distribute this across multiple processes, we may see a performance gain.
Luckily, we can do just that by using the processes
option for uWSGI into the uwsgi.ini
file.
processes = 4
The above code will tell uWSGI to start four instances of our web application, but this alone isn’t the only thing we can do to increase our possible throughput.
While tuning HAProxy, I talked a bit about CPU Affinity. By default, uWSGI processes have the same CPU Affinity as the master process. What this means is that even though we will now have four instances of our application, all four processes are using the same CPU.
If our system has more than one CPU available, we are neglecting to leverage all of our processing capabilities. Once again we can check the number of available CPUs by executing the lshw
command as shown below:
# lshw -short -class cpu H/W path Device Class Description ============================================ /0/401 processor Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz /0/402 processor Intel(R) Xeon(R) CPU E5-2650L v3 @ 1.80GHz
From the output above, our test system has two CPUs available. This means even with four processes, we are only using about half of our processing capability. We can fix this by adding two more uWSGI options, threads
and enable-threads
, into the uwsgi.ini
configuration file.
processes = 4 threads = 2 enable-threads = True
The threads
option is used to tell uWSGI to start our application in prethreaded mode. That essentially means it is launching the application across multiple threads, making our four processes essentially eight processes.
This also has the effect of distributing the CPU Affinity across both of our available CPUs.
The enable-threads
option is used to enable threading within uWSGI. This option is required whether you use uWSGI to create threads or you use threading within the application itself. If you have a multithreaded application and performance is not what you expect, it’s a good idea to make sure enable-threads
is set to True
.
Retesting for performance changes
With these three options now set, let’s ago ahead and restart our uWSGI processes and rerun the same ab
test we ran earlier.
# ab -c 500 -n 5000 -s 90 http://example.com/9000 Requests per second: 1068.63 [#/sec] (mean) Time per request: 467.888 [ms] (mean) Time per request: 0.936 [ms] (mean, across all concurrent requests)
The results of this test are quite a bit different than the original baseline. In the above test, we can see that our Requests per second is now 1068
. This is a 207%
improvement by simply enabling multiple threads and processes.
As we have seen in previous tuning exercises, adding multiple uWSGI workers seems to have drastic improvements in performance.
Disable Logging
While the most common option, multithreading is not the only performance tuning option available for uWSGI. Another trick we have available is to disable logging.
While it might not be immediately obvious, logging levels often have a drastic effect on the overall performance of an application. Let’s see how much of an impact this change has on our performance before we dig into why and how it improves performance.
disable-logging = True
In order to disable logging within uWSGI, we can simply add the disable-logging
option into the uwsgi.ini
configuration file as shown above.
While this option may sound like it disables all logging, in reality uWSGI will still provide some logging output. However, the amount of log messages is drastically decreased by only showing critical events.
Let’s go ahead and see what the impact is by restarting uWSGI and rerunning our test.
# ab -c 500 -n 5000 -s 90 http://example.com/9000 Requests per second: 1483.35 [#/sec] (mean) Time per request: 337.076 [ms] (mean) Time per request: 0.674 [ms] (mean, across all concurrent requests)
From the above example, we can see that we are now able to send 1483
requests per second. This is an improvement of over 400
requests per second; quite an increase for such a small change.
By default, uWSGI will log each and every HTTP request to the system console. This activity not only takes resources to present the log message to the screen, but also within the code there is logic performing the logging and formatting of the log message. By disabling this, we are able to avoid these activities and dedicate those same resources to performing our application tasks.
The next option is an interesting one; on the surface, it does not seem like it should improve performance but rather degrade it. Our next option is max-worker-lifetime
.
Max Worker Lifetime
The max-worker-lifetime
option tells uWSGI to restart worker processes after the specified time (in seconds). Let’s go ahead and add the following to our uwsgi.ini
file:
max-worker-lifetime = 30
This will tell uWSGI to restart worker processes every 30
seconds. Let’s see what effect this has after restarting our uWSGI processes and rerunning the ab
command.
# ab -c 500 -n 5000 -s 90 http://example.com/9000 Requests per second: 1606.62 [#/sec] (mean) Time per request: 311.212 [ms] (mean) Time per request: 0.622 [ms] (mean, across all concurrent requests)
What is interesting is that one would expect uWSGI to lose some capacity while restarting worker processes. The result of the test however, increases our throughput by another 100
Requests per second.
This works because this web application does not need to maintain anything in memory across multiple requests. This specific application actually works faster the newer the process is.
The reason for this is simple: A newer process has fewer memory management tasks to perform, as each HTTP requests create objects in memory for the web application. Eventually the application has to clean up these objects.
By restarting the processes periodically, we are able to forcefully create a clean instance for the next request.
When leveraging a middleware component such as uWSGI, this process can be very effective. This option can also be a bit of a double-edged sword; a value too low may cause more overhead restarting processes then the benefit it brings. As with anything, it’s best to try multiple values and see which fits the application at hand.
Compiling Our Python Library to C
Now that we’ve tuned uWSGI, we can start looking at other options for greater performance, such as modifying the application itself and how it works.
If we look at the application above, all of the Fibonacci sequence generation is contained within the library fib
. If we were able to speed up that library, we may see even more performance gains.
A somewhat simple way of speeding up that library is to convert the Python code to C code and tell our application to use the C library instead of a Python library. While this might sound like a hefty task, it is actually fairly simple using Cython.
Cython is a static compiler that is used for creating C extensions for Python. What this means is we can take our fib.py
and convert it into a C extension.
Let’s go ahead and do just that.
Install Cython
Before we can use Cython, we are going to need to install it as well as another system package. The system package in question is the python-dev
package. This package includes various libraries used during the compilation of Cython-generated C code.
To install this system package, we will once again use the Apt package manager.
# apt-get install python-dev
With the python-dev
package installed, we can now install the Cython
package using pip
.
# pip install Cython
Once complete, we can start to convert our fib
library to a C extension.
Converting our library
In order to facilitate the conversion, we will go ahead and create a setup.py
file. Within this file, we’ll add the following Python code:
from distutils.core import setup from Cython.Build import cythonize setup( ext_modules=cythonize("fib.py"), )
When executed, the above code will “Cythonize” the fib.py
file, creating generated C code. Let’s ago ahead and execute setup.py
to get started.
# python setup.py build_ext --inplace
Once the above execution is completed, we should see a total of three files for the fib
library.
$ ls -la total 196 drwxr-xr-x 1 root root 272 Dec 5 21:52 . drwxr-xr-x 1 root root 136 Dec 3 21:05 .. -rw-r--r-- 1 root root 317 Dec 4 03:22 app.py drwxr-xr-x 1 root root 102 Dec 3 21:03 build -rw-r--r-- 1 root root 105135 Dec 5 21:52 fib.c -rw-r--r-- 1 root root 281 Dec 3 21:03 fib.py -rwxr-xr-x 1 root root 80844 Dec 5 21:52 fib.so -rw-r--r-- 1 root root 115 Dec 5 21:51 setup.py
The fib.c
file is the C source file that was created by Cython, and the fib.so
file is the compiled version of this file that our application can import at run time.
Let’s go ahead and restart our application and rerun our test again to see the results.
# ab -c 500 -n 5000 -s 90 http://example.com/9000 Requests per second: 1744.61 [#/sec] (mean) Time per request: 286.598 [ms] (mean) Time per request: 0.573 [ms] (mean, across all concurrent requests)
While the results do not show as much of an increase — 144
requests per second — there is an increase in throughput none the less. As with most things, the results with Cython will vary from application to application.
Summary
In this article, with just a few tweaks to uWSGI and our application, we were not only able to increase performance, we were able to do so significantly.
When we started, our app was only able to accept 347
requests per second. After changing simple parameters, such as the number of worker processes and disabling logging mechanisms, we were able to push this application to 1744
requests per second.
The number of requests is not the only thing that increased. We were also able to reduce the time our application takes to respond to each request. If we go back to the beginning, the “mean” application request took 1.4 seconds to execute. After our changes, this same “mean” is 286
milliseconds. This means overall we were able to shave about 1.1 seconds per request; a respectable difference.
While this article covered most of the available performance-tuning options within uWSGI, there are still quite a few that we haven’t touched. If you have a parameter that you feel we should have explored, feel free to drop it into the comments section.
Reference: | Getting Every Microsecond Out of uWSGI from our WCG partner Ben Cane at the Codeship Blog blog. |