How to Convert a String to Lowercase in Python
As this series grows, I’ve started poking at seemingly simple problems to expose their complexity. This time, I thought it would be interesting to look at how to convert a string to lowercase.
As it turns out, there’s a really straightforward solution (lower()
), but I think it’s worth looking at a few homemade solutions first. For instance, we could try building up a string by looping over every character. If that sounds interesting, check out the rest of this article.
Problem Description
If you’ve ever tried to write code which manipulates strings, you know how painful of a process it can be. For instance, try writing some code to reverse a string. Pro tip: it’s not as easy as you think. I know this because I added string reversal as one of the challenges in our Sample Programs repository.
When I was building up that repo, I found out that you can’t just start at the end of the string and print out the characters in reverse. That’ll work for simple strings like most of the text in this article. However, it could fail for more complex characters like emojis.
All that said, Python 3 does a great job of abstracting characters, so you may not run into issues. For example, the following code seems to work fine:
1 2 3 | >>> hero = "😊" >>> hero[::- 1 ] '😊' |
Now, I bring this up because today we want to talk about converting a string to lowercase. If you’ve been around Python awhile, you know there’s a quick way to do this. However, if you haven’t, there’s a chance you might try to do it yourself (or you have to do it yourself for a course). As a result, I’ll be setting a constraint for this entire article: assume ASCII.
This constraint can save us a lot of pain and suffering. It basically restricts us to the first 128 characters (or 256 depending on who you ask). That way, we don’t have to worry about dealing with characters from other languages or emojis.
Assuming ASCII, we should be able to convert a string like “All Might” to “all might” fairly easily. In the sections below, we’ll look at a few solutions that will be able to do just this.
Solutions
In this section, we’ll take a look at each solution I could come up with. Since this problem has been trivially solved by the lower()
method, most of these solutions are essentially brute force. In other words, each solution goes through a different strategy for converting a string to lowercase by hand. If that’s not you’re thing, feel free to jump to the last solution. For everyone else, let’s take a look at our first brute force solution!
Convert a String to Lowercase by Brute Force
Since we’re assuming ASCII, we can try to convert our string to lowercase by looking at the ordinal values of each character. In other words, each character is assigned to some number. If a character’s ID falls within the range of capital letters, we should be able to find its corresponding lowercase ID and replace it. That’s exactly what we do below:
1 2 3 4 5 6 7 | hero = "All Might" output = "" for char in hero: if "A" <= char <= "Z" : output += chr(ord( char ) - ord( 'A' ) + ord( 'a' )) else : output += char |
Here, we create a string called hero
which stores the name “All Might”. Then, we create an empty output string. After that, we loop over every character in the string checking to see if the current character falls in the range of capital letters. If it does, we convert it to lowercase with this clever little expression:
1 | chr(ord( char ) - ord( 'A' ) + ord( 'a' )) |
By subtracting ord('A')
, we get the index of the character in the alphabet. For example, if char
was “C”, the expression ord(char) - ord('A')
would be 2. Then, all we need to know is what the ordinal value of ‘a’ is to shift our index into the range of lowercase letters. In other words, this expression converts any uppercase letter to lowercase.
One thing I don’t love about this algorithm is the concatenation. In general, it’s a bad idea to concatenate strings in a loop like this. As a result, we could use a list instead:
1 2 3 4 5 6 7 8 | hero = "All Might" output = [] for char in hero: if "A" <= char <= "Z" : output.append(chr(ord( char ) - ord( 'A' ) + ord( 'a' ))) else : output.append( char ) output = "" .join(output) |
In the performance section, we’ll take a look to see if this matters at all. For now though, let’s dig into some better options.
Convert a String to Lowercase Using ASCII Collections
In the previous solution, we computed lowercase values mathematically. However, what if we just happened to have the lowercase and uppercase letters available to us as a collection? As it turns out, the string library has us covered:
1 | from string import ascii_lowercase, ascii_uppercase |
If you’re curious to know what these values look like, I checked for us:
1 2 3 4 | >>> ascii_lowercase 'abcdefghijklmnopqrstuvwxyz' >>> ascii_uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
As we can see, each value is a string containing the alphabet. Now, it’s just a matter of mapping from one set to the other given an index:
1 2 3 4 5 6 7 8 | hero = "All Might" output = [] for char in hero: if char in ascii_uppercase: output.append(ascii_lowercase[ascii_uppercase.index( char )]) else : output.append( char ) output = "" .join(output) |
Again, we loop over every character in our string. Of course, this time we check if that character is in the uppercase set. If it is, we look for the corresponding lowercase character and add it to our final string. Otherwise, we append the original character.
Personally, I like this solution a little bit better because we’re more explicitly dealing with certain sets of characters. That said, there is still a better solution ahead.
Convert a String to Lowercase Using a List Comprehension
Looking at the solutions above, I thought it might be fun to try to use a list comprehension. It’s not pretty, but it gets the job done:
1 2 3 4 5 | from string import ascii_uppercase, ascii_lowercase hero = "All Might" output = [ascii_lowercase[ascii_uppercase.index( char )] if char in ascii_uppercase else char for char in hero] output = "" .join(output) |
If you’d prefer something a little more readable, here’s the same list comprehension with the expression separate from the loop:
1 2 3 4 5 6 | [ ascii_lowercase[ascii_uppercase.index( char )] if char in ascii_uppercase else char for char in hero ] |
Basically, we say that for each character in hero
, assume we’re going to convert uppercase to lowercase. Otherwise, leave the character unchanged.
Honestly, this might be a bit cleaner if we pulled the expression out into a function:
1 2 3 4 5 | def to_lowercase( char : str): if char in ascii_uppercase: return ascii_lowercase[ascii_uppercase.index( char )] else : return char |
Then, we could call this function in place of that mess:
1 | [to_lowercase( char ) for char in hero] |
Now, that’s a lot cleaner! Of course, there is definitely a better solution to follow. That said, if you like list comprehensions, and you want to learn more about them, check out my article on how to write list comprehensions.
Convert a String to Lowercase Using the lower()
Method
Up to this point, we tried rolling our own lowercase function. Due to the complexity of strings, it turned out to be a nontrivial matter. Luckily, the Python developers knew this would be a popular request, so they wrote a method for us:
1 2 | hero = "All Might" hero.lower() |
And, that’s it! In one line, we can convert a string to lowercase.
Since we assumed ASCII up to this point, there’s not much to say in terms of the benefits with this solution. Sure, lower()
is likely more convenient and faster than our previous solutions, but our assumption has stopped us from talking about the real benefit: it works beyond ASCII.
Unlike our previous solutions, this solution will work for basically any locale where the concepts of uppercase and lowercase make sense. In other words, lower()
should work in contexts beyond ASCII. If you’re interested in how it works under the hood, check out section 3.13 of the Unicode standard.
Performance
At this point, let’s take a look at how each solution compares in terms of performance. If you’ve been around awhile, you know we start off testing by storing each solution in a string. If this is your first time seeing one of these tutorials, you can get up to speed on performance testing with this article. Otherwise, here are the strings:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | setup = "" " hero = "All Might" from string import ascii_lowercase, ascii_uppercase "" " brute_force_concat = "" " output = "" for char in hero: if "A" <= char <= "Z" : output += chr(ord( char ) - ord( 'A' ) + ord( 'a' )) else : output += char "" " brute_force_list = "" " output = [] for char in hero: if "A" <= char <= "Z" : output.append(chr(ord( char ) - ord( 'A' ) + ord( 'a' ))) else : output.append( char ) output = "" .join(output) "" " ascii_collection = "" " output = [] for char in hero: if char in ascii_uppercase: output.append(ascii_lowercase[ascii_uppercase.index( char )]) else : output.append( char ) output = "" .join(output) "" " list_comp = "" " output = [ascii_lowercase[ascii_uppercase.index( char )] if char in ascii_uppercase else char for char in hero] output = "" .join(output) "" " lower_method = "" " output = hero.lower() "" " |
Then, if we want to performance test these solutions, we can import the timeit
library and run the repeat()
method:
01 02 03 04 05 06 07 08 09 10 11 | >>> import timeit >>> min(timeit.repeat(setup=setup, stmt=brute_force_concat)) 1.702892600000041 >>> min(timeit.repeat(setup=setup, stmt=brute_force_list)) 1.9661427000000913 >>> min(timeit.repeat(setup=setup, stmt=ascii_collection)) 1.5348989000001438 >>> min(timeit.repeat(setup=setup, stmt=list_comp)) 1.4514239000000089 >>> min(timeit.repeat(setup=setup, stmt=lower_method)) 0.07294070000011743 |
Unsurprisingly, the lower()
method is incredibly fast. We’re talking a 100 times faster than our brute force solutions. That said, I was actually surprised by the minor improvement in speed that concatenation has over using a list in our example. As a result, I decided to use a larger string for testing:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 | >>> setup = "" " hero = "If you feel yourself hitting up against your limit remember for what cause you clench your fists... remember why you started down this path, and let that memory carry you beyond your limit." from string import ascii_lowercase, ascii_uppercase "" " >>> min(timeit.repeat(setup=setup, stmt=brute_force_concat)) 22.304970499999996 >>> min(timeit.repeat(setup=setup, stmt=brute_force_list)) 24.565209700000025 >>> min(timeit.repeat(setup=setup, stmt=ascii_collection)) 19.60345490000003 >>> min(timeit.repeat(setup=setup, stmt=list_comp)) 13.309821600000078 >>> min(timeit.repeat(setup=setup, stmt=lower_method)) 0.16421549999995477 |
Somehow, concatenation is still a little bit faster than using a list. This surprised me a lot. After all, pretty much all literature points to concatenation being a bad idea, so I was a bit stumped. As a result, I actually went as far as to duplicate the test code from that article above to see if I was doing something wrong in my testing:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 | >>> setup = "" " hero = "All Might" loop_count = 500 from string import ascii_lowercase, ascii_uppercase def method1(): out_str = '' for num in range(loop_count): out_str += str(num) return out_str def method4(): str_list = [] for num in range(loop_count): str_list.append(str(num)) return '' .join(str_list) "" " >>> min(timeit.repeat(setup=setup, stmt= "method1()" )) 156.1076584 >>> min(timeit.repeat(setup=setup, stmt= "method4()" )) 124.92521890000012 |
To me, there’s one of two things going on:
- Either my test is bad
- Or, there is some crossover point where the
join()
method is better
As a result, I decided to test the same code for various amounts of loop_count
:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | # Loop count = 10 >>> min(timeit.repeat(setup=setup, stmt= "method1()" )) 2.665588600000774 >>> min(timeit.repeat(setup=setup, stmt= "method4()" )) 3.069867900000645 # Loop count = 25 >>> min(timeit.repeat(setup=setup, stmt= "method1()" )) 6.647211299999981 >>> min(timeit.repeat(setup=setup, stmt= "method4()" )) 6.649540800000068 # Loop count = 50 >>> min(timeit.repeat(setup=setup, stmt= "method1()" )) 12.666602099999182 >>> min(timeit.repeat(setup=setup, stmt= "method4()" )) 12.962779500000579 # Loop count = 100 >>> min(timeit.repeat(setup=setup, stmt= "method1()" )) 25.012076299999535 >>> min(timeit.repeat(setup=setup, stmt= "method4()" )) 29.01509150000038 |
As I was running these tests, I had a sudden epiphany: you can’t run other programs while testing code. In this case, tests were taking so long that I decided to play Overwatch while waiting. Bad idea! It skewed all my tests. As a result, I decided to retest all of our solutions under the same exact conditions. Here are the results where the parentheses indicate the length of the string under test:
Solution | Time (10) | Time (25) | Time (50) | Time (100) |
---|---|---|---|---|
Brute Force Concatenation | 0.94944 | 3.72814 | 8.33579 | 17.56751 |
Brute Force List | 1.27567 | 4.45463 | 9.33258 | 20.43046 |
ASCII Collection | 1.23441 | 4.26218 | 9.26588 | 19.34155 |
List Comprehension | 1.03274 | 2.99414 | 6.13634 | 12.71114 |
Lower Method | 0.07121 | 0.08575 | 0.11029 | 0.163998 |
To be honest, I wasn’t able to isolate the discrepancy. My guess is that at some point concatenation gets bad; I just haven’t been able to prove it. That said, I haven’t found myself building up massive strings, so I don’t imagine it actually matters. Of course, there’s probably some application where it does.
At any rate, it’s clear that the lower()
method is almost certainly the way to go (unless you have some sort of class assignment that says otherwise). Of course, take these measures with a grain of salt. For context, I’m on a Windows 10 system running Python 3.8.2.
Challenge
Since we spent the whole article talking about converting strings to lowercase, I figured for the challenge we can try something a little different. To make things more interesting, I thought it might even be fun to specify a couple challenges:
- Convert a string to uppercase (e.g. “all might” -> “ALL MIGHT”)
- Convert a string to sarcasm case (e.g. “All Might” -> “AlL miGhT”)
- For this one, I wasn’t sure if it made more sense to alternate or just randomly case each letter. You can decide!
- Convert a string to title case (e.g. “all might” -> “All Might”)
Each one of these challenges comes with a unique set of problems. Feel free to share a solution to any of them down below in the comments. As always, I’ll drop one down there as well to get y’all started.
A Little Recap
With all that said, I think we’re done for the day. Here are all the solutions from this article in one convenient place:
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | from string import ascii_lowercase, ascii_uppercase hero = "All Might" # Brute force using concatenation output = "" for char in hero: if "A" <= char <= "Z" : output += chr(ord( char ) - ord( 'A' ) + ord( 'a' )) else : output += char # Brute force using join output = [] for char in hero: if "A" <= char <= "Z" : output.append(chr(ord( char ) - ord( 'A' ) + ord( 'a' ))) else : output.append( char ) output = "" .join(output) # Brute force using ASCII collections output = [] for char in hero: if char in ascii_uppercase: output.append(ascii_lowercase[ascii_uppercase.index( char )]) else : output.append( char ) output = "" .join(output) # Brute force using a list comprehension output = [ascii_lowercase[ascii_uppercase.index( char )] if char in ascii_uppercase else char for char in hero] output = "" .join(output) # Built-in Python solution output = hero.lower() |
In addition, you’re welcome to keep browsing. Here are some related articles:
- How to Compare Strings in Python: Equality and Identity
- How to Check if a String Contains a Substring in Python: In, Index, and More
Published on Web Code Geeks with permission by Jeremy Grifski, partner at our WCG program. See the original article here: How to Convert a String to Lowercase in Python Opinions expressed by Web Code Geeks contributors are their own. |