Python

How to Obfuscate Code in Python: A Thought Experiment

As with most articles in this series, I was doing some browsing on Google, and I found that some folks had an interest in learning how to obfuscate code in Python. Naturally, I thought that would be a fun topic. By no means am I an expert, but I’m familiar with the idea. As a result, treat this like a fun thought experiment.

Problem Description

Unlike most articles in this series, I’m not looking for a quick answer to code obfuscation—the process of making code unreadable. Instead, I want to look at various obfuscation methods. To do that, we’ll need some piece of nicely formatted source code:

1
2
3
4
5
6
7
8
9
def read_solution(solution_path: str) -> list:
    """
    Reads the solution and returns it as a list of lines.
    :param solution_path: path to the solution
    :return: the solution as a list of lines
    """
    with open(solution_path, encoding="utf8") as solution:
        data = solution.readlines()
    return data

Cool! Here’s a standalone function that I pulled from my auto-grader project. It’s not the best code in the world, but I figured it would serve as a nice example. After all, it’s a short snippet that performs a simple function: reads a file and dumps the results as a list of lines.

In this article, we’ll take a look at a few ways of making this code snippet as unintelligible as possible. Keep in mind that I’m not an expert at this. Rather, I thought this would be a fun exercise where we could all learn something.

Solutions

In this section, we’ll take a look at several ways to obfuscate code. In particular, we’ll be taking the original solution and gradually manipulating it throughout this article. As a result, each solution will not be a standalone solution. Instead, it will be an addition to all previous solutions.

Obfuscate Code by Removing Comments

One surefire way to make code hard to read is to begin by avoiding best practices. For instance, we could start by removing any comments and docstrings:

1
2
3
4
def read_solution(solution_path: str) -> list:
    with open(solution_path, encoding="utf8") as solution:
        data = solution.readlines()
    return data

In this case, the solution is self-documenting, so it’s fairly easy to read. That said, the removal of the comment does make it slightly harder to see exactly what this method accomplishes.

Obfuscate Code by Removing Type Hints

With the comments out of the way, we can begin removing other helpful pieces of syntax. For example, we have a few bits of syntax which help people track variable types throughout the code. In particular, we indicated that the input parameter solution_path should be a string. Likewise, we also indicated that the function returns a list. Why not remove those type hints?

1
2
3
4
def read_solution(solution_path):
    with open(solution_path, encoding="utf8") as solution:
        data = solution.readlines()
    return data

Again, this function is still fairly manageable, so it wouldn’t be too hard to figure out what it does. In fact, almost all Python code looked like this at one point, so I wouldn’t say we’ve reached any level of obfuscation yet.

Obfuscate Code by Removing Whitespace

Another option for visual obfuscation is removing all extraneous whitespace. Unfortunately, in Python, whitespace has value. In fact, we use it to indicate scope. That said, there’s still some work we can do:

1
2
3
4
def read_solution(solution_path):
    with open(solution_path,encoding="utf8") as solution:
        data=solution.readlines()
    return data

Here, we were only able to remove three spaces: one between solution_path and encoding, one between data and =, and one between = and solution.readlines(). As a result, the code is still fairly readable. That said, as we begin to obfuscate our code a bit more, we’ll see this solution pay dividends.

Obfuscate Code by Abandoning Naming Conventions

One thing we have full control over in code is naming conventions. In other words, we decide what we name our functions and variables. As a result, it’s possible to come up with names that completely obfuscate the intent of a variable or function:

1
2
3
4
def x(a):
    with open(a,encoding="utf8") as z:
        p=z.readlines()
    return p

Here, we’ve lost all semantic value that we typically get from variable and function names. As a result, it’s even hard to figure out what this program does.

Personally, I don’t think this goes far enough. If we were particularly sinister, we’d generate long sequences of text for each name, so it’s even more difficult to understand:

1
2
3
4
def IdDG0v5lX42t(hjqk4WN0WwxM):
    with open(hjqk4WN0WwxM,encoding="utf8") as ltZH4QOxmGy8:
        QVsxkg07bMCs=ltZH4QOxmGy8.readlines()
    return QVsxkg07bMCs

Hell, I might even use a single random string of characters and only modify bits of it. For example, we could try using the function name repeatedly with slight alterations (e.g. 1 for l, O for 0, etc.):

1
2
3
4
def IdDG0v5lX42t(IdDG0v51X42t):
    with open(IdDG0v51X42t,encoding="utf8") as IdDGOv51X42t:
        IdDGOv51X4Rt=IdDGOv51X42t.readlines()
    return IdDGOv51X4Rt

Of course, while this looks harder to read, nothing is really stopping the user from using an IDE to follow each reference. Likewise, compiling and decompiling this function (i.e. .py -> .pyc -> .py) would probably undo all our hard labor. As a result, we’ll have to go deeper.

Obfuscate Code by Manipulating Strings

Another way to make code unintelligible is to find hardcoded strings like “utf8” in our example and add an unnecessary layer of abstraction to them:

1
2
3
4
5
def IdDG0v5lX42t(IdDG0v51X42t):
    I6DGOv51X4Rt=chr(117)+chr(116)+chr(102)+chr(56)
    with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
        IdDGOv51X4Rt=IdDGOv51X42t.readlines()
    return IdDGOv51X4Rt

Here, we’ve constructed the string “utf8” from its ordinal values. In other words, ‘u’ corresponds to 117, ‘t’ corresponds to 116, ‘f’ corresponds to 102, and ‘8’ corresponds to 56. This additional complexity is still pretty easy to map. As a result, it might be worthwhile to introduce even more complexity:

1
2
3
4
5
def IdDG0v5lX42t(IdDG0v51X42t):
    I6DGOv51X4Rt="".join([chr(117),chr(116),chr(102),chr(56)])
    with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
        IdDGOv51X4Rt=IdDGOv51X42t.readlines()
    return IdDGOv51X4Rt

Instead of direct concatenation, we’ve introduced the join method. Now, we have a list of characters as numbers. Let’s reverse the list just to add a bit of entropy to the system:

1
2
3
4
5
def IdDG0v5lX42t(IdDG0v51X42t):
    I6DGOv51X4Rt="".join(reversed([chr(56),chr(102),chr(116),chr(117)]))
    with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
        IdDGOv51X4Rt=IdDGOv51X42t.readlines()
    return IdDGOv51X4Rt

How about that? Now, we have even more code we can begin modifying.

Obfuscate Code by Manipulating Numbers

With our “utf8” string represented as a reversed list of numbers, we can begin changing their numeric representation. For example, 56 is really 28 * 2 or 14 * 2 * 2 or 7 * 2 * 2 * 2. Likewise, Python supports various bases, so why not introduce hexadecimal, octal, and binary to the mix?

1
2
3
4
5
def IdDG0v5lX42t(IdDG0v51X42t):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101)]))
    with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
        IdDGOv51X4Rt=IdDGOv51X42t.readlines()
    return IdDGOv51X4Rt

Suddenly, it’s unclear what numbers we’re even working with. To add a bit of chaos, I thought it would be fun to insert a whitespace character:

1
2
3
4
5
def IdDG0v5lX42t(IdDG0v51X42t):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
        IdDGOv51X4Rt=IdDGOv51X42t.readlines()
    return IdDGOv51X4Rt

Then, we can call the strip method to remove that extra space.

Obfuscate Code by Introducing Dead Code

In the previous example, we added a whitespace character to our string to make it slightly more difficult to decode. We can now take that idea and begin to add code that doesn’t really do anything:

1
2
3
4
5
6
7
8
def IdDG0v5lX42t(IdDG0v51X42t):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    else:
        return list()

Here, I’ve introduce a dead branch. In other words, we’re operating under the assumption that the input is a valid string. As a result, we can add a silly case where we check if the string has a length greater than -1—which is always true. Then, on the dead branch, we return some generic value.

At this point, what is stopping us from writing a completely ridiculous dead block? In other words, instead of returning a simple junk value, we could construct a complex junk value:

1
2
3
4
5
6
7
8
9
def IdDG0v5lX42t(IdDG0v51X42t):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    else:
        IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6
        return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]

Honestly, I could have put anything in the dead block. For fun, I decided to play with the input string. For instance, I constructed a substring and repeated it. Then, I constructed a list from the characters in that new string.

Obfuscate Code by Adding Dead Parameters

If we can introduce dead branches, we can absolutely introduce dead parameters. However, we don’t want to alter the behavior of the underlying function, so we’ll want to introduce default parameters:

1
2
3
4
5
6
7
8
9
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    else:
        IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6
        return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]

Of course, this parameter is of no use currently. In other words, let’s try doing something with it:

1
2
3
4
5
6
7
8
9
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    else:
        IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6
        return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]

Now, there is something beautiful about the expression LdDG0v51X42t%2!=0. To me, it looks like a password—not a test for odd numbers.

Of course, why stop there? Another cool thing we can do with parameters is take advantage of variable length arguments:

1
2
3
4
5
6
7
8
9
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    else:
        IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6
        return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]

Now, we’ve opened the door to an unlimited number of arguments. Let’s add some code to make this interesting:

01
02
03
04
05
06
07
08
09
10
11
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    elif LdDG0v51X42tf:
        return list()
    else:
        IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6
        return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]

Again, we’ll never hit this branch because the first condition is always true. Of course, the casual reader doesn’t know that. At any rate, let’s have some fun with it:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    elif LdDG0v51X42tf:
        while LdDG0v51X42tf:
            LdDG0v51X42tx=LdDG0v51X42tf.pop()
            LdDG0v51X42tf.append(LdDG0v51X42tx)
        return LdDG0v51X42tf
    else:
        IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6
        return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]

Yep, that’s an infinite loop! Unfortunately, it’s sort of obvious. That said, I suspect that the variable names will obscure the intent for a little while.

Other Ways to Obfuscate Code

Once again, I’ll mention that this article was more of a thought experiment for me. I had seen obfuscated code in the past, and I thought it would be fun to give it a try myself. As a result, here’s the original snippet and the final snippet for comparison:

1
2
3
4
5
6
7
8
9
def read_solution(solution_path: str) -> list:
    """
    Reads the solution and returns it as a list of lines.
    :param solution_path: path to the solution
    :return: the solution as a list of lines
    """
    with open(solution_path, encoding="utf8") as solution:
        data = solution.readlines()
    return data
01
02
03
04
05
06
07
08
09
10
11
12
13
14
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf):
    I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip()
    if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1:
        with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
            IdDGOv51X4Rt=IdDGOv51X42t.readlines()
        return IdDGOv51X4Rt
    elif LdDG0v51X42tf:
        while LdDG0v51X42tf:
            LdDG0v51X42tx=LdDG0v51X42tf.pop()
            LdDG0v51X42tf.append(LdDG0v51X42tx)
        return LdDG0v51X42tf
    else:
        IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6
        return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]

At this point, I suppose we could continue to iterate, but I’m not sure that would be the best use of my time. That said, there were a few things I considered trying. For instance, I thought about compressing lines of code such as:

1
2
3
with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
    IdDGOv51X4Rt=IdDGOv51X42t.readlines()
return IdDGOv51X4Rt

Into something like:

1
2
with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t:
    return IdDGOv51X42t.readlines()

However, part of me felt like this would actually make the code easier to read since we wouldn’t have to map variable names.

In addition, I thought about making some methods just to pollute the namespace a little bit. For example, we could create functions that overwrite some of the standard library. Then, give them totally different behavior. In our case, we might redefine reversed to confuse the reader into thinking it has its typical behavior:

1
2
def reversed(x):
    return "utf8"

Then, we could pass whatever we wanted into it as bait. Wouldn’t that be sinister?

Beyond that, I’m aware that there are obfuscation tools out there, but I’m not sure how widely used they are. Here are a few examples:

  • pyarmor: “A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.”
  • pyminifier: “Minify, obfuscate, and compress Python code”
  • Opy: “Obfuscator for Python”
  • Oxyry: “the power to protect your python source code”

I haven’t tried many of these tools, but Oxyry is definitely the most convenient. When I plug our function into it, it generates the following code:

1
2
3
4
5
def read_solution (OOOO0OO0OO00OOOOO :str )->list :#line:1
    ""#line:6
    with open (OOOO0OO0OO00OOOOO ,encoding ="utf8")as OO0O00OO0O0O0OO0O :#line:7
        OO0000O00O0OO0O0O =OO0O00OO0O0O0OO0O .readlines ()#line:8
    return OO0000O00O0OO0O0O

Clearly, that’s not great, but I suppose it’s effective. If you know of any other tools or cool techniques, feel free to share them in the comments.

Challenge

For today’s challenge, pick a piece of code and try to obfuscate it. Feel free to use all of the ideas leveraged in this article. However, the challenge will be to come up with your own ideas. What other ways can we obfuscate Python code?

If you’re looking for some ideas, I mentioned a couple in the previous section. Of course, there are other things you could try. For instance, you could always add a logger which prints erroneous messages to the console. Something like this would have no effect on your program’s behavior, but it could confuse a reader.

If you want to go the extra mile, try writing a program which performs your favorite obfuscation technique. For instance, could you write a program which could identify Python variables? If so, you could generate your own symbol table which would track all variables. Then, you could generate new names without any worries about clashes.

At the end of the day, however, treat this challenge like a fun thought experiment. I don’t expect any of these methods to be all that practical. After all, if a machine can run the code even in an obfuscated state, so can a human (eventually).

A Little Recap

Typically, in this section, I would list off all the solutions. However, the code snippets are quite long, and I don’t think it makes a lot of sense for me to dump them here. As a result, I’ll just share the options as a list:

  • Remove comments, type hints, and whitespace
  • Abandon naming conventions
  • Manipulate strings and numbers
  • Introduce dead code and parameters
  • Try something else

With that, I think we’re don for the day. If you like this sort of content, I’d appreciate it if you checked out an article on the different ways you can support the site.

Finally, here are some related articles:

Once again, thanks for stopping by. See you next time!

Published on Web Code Geeks with permission by Jeremy Grifski, partner at our WCG program. See the original article here: How to Obfuscate Code in Python: A Thought Experiment

Opinions expressed by Web Code Geeks contributors are their own.

Jeremy Grifski

Jeremy is the founder of The Renegade Coder, a software curriculum website launched in 2017. In addition, he is a PhD student with an interest in education and data visualization.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button