Python

Python NumPy Array Tutorial

Today we will be talking about one useful library that is used in data science. Now, why do we need a library and can’t use Python itself? Well, we can actually use Python itself, but once you get familiar with working in NumPy, you will see the difference.
 
 
 
 
 
 
 
 

1. What is NumPy?

NumPy is the core library and foundation for data science. It works great with multidimensional array objects. The library’s name is short for “Numerical Python”. So basically what we do with it is solve on a computer mathematical models of problems in Science and Engineering. It’s far more efficient just to use this library instead of writing your own in Python, because it’s already got all features you need. So let’s get started, shall we?

1.1 Installing NumPy

You need to install NumPy before working with it. It’s not a pre-installed library in your computer. By now, you need to have at least Python installed in your computer. We will do the most common way of installing which is pip.
If you are Windows user, try this:

Windows installation

pip3 install numpy

Note that it may now work, so there are 2 solutions: either add Python to the PATH environment, or check if you have pip installed. If you don’t, then try this:

Windows installation

pip install pip --upgrade
pip --version

Then you need to download wheel from the Internet and install it. Then you will be ready to go. Now, if you are a Linux user, this one will help: Go to command line by thisCtrl+Alt+T or any other combination you have in your distribution. Then type:

Linux installation

sudo pip3 install numpy

That’s it! Now you are ready to code!

2. Working with arrays

2.1. Arrays manipulation

We already know what an array is. It’s a grid of values of the same type. An array represents any regular data in a structured way. For example, you may have an array of strings, integers, booleans, etc. Let’s create one:

Python Shell

>>> my_array = [1,2,3]
>>> print(my_array)
[1, 2, 3]
>>> my_2d_array = [4,[5,6],7]
>>> print(my_2d_array)
[4, [5, 6], 7]
>>> my_3d_array = [8,[9,[10,11],12],13]
>>> print(my_3d_array)
[8, [9, [10, 11], 12], 13]

As we can see, we have got couple arrays representing integers. However, if we dive into Computer Science, an array actually contains more than just elements. It contains information about the raw data, where and how to locate elements in the array and how to interpret them. Let’s talk about 4 basic aspects of manipulating arrays:

  • data is a pointer that shows the memory address of the first byte in the array. It’s important to know it before you do any manipulations with it
  • dtype is a data type pointer that shows what kind of elements are displayed in the given array
  • shape indicates the shape of the given array. It can be multidimensional, so it’s wise to know about the shape
  • strides is something really tricky. It shows how many bytes are needed to be skipped in order to hop on a different element in the given array. It’s a pretty confusing explanation. Imagine that you just can’t get access to another element and you just don’t see it, unless you pay an effort to skip some bytes. It’s more of a pointer logic, but it’s really important to understand it well

2.2. NumPy arrays

In order to create a numpy array we need to use the np.array() function. It’s a common practice to import numpy as np. That’s what we will do as well in this article. Let’s create couple arrays:

array.py

import numpy as np

a = np.array([1,2,3,4,5])
print(type(a))
#outputs <class 'numpy.ndarray'>

print(a.shape)
#outputs (3,)

print(a.dtype)
#outputs int64

print(a.strides)
#outputs (8,)

print(a.data)
#check the output for yourself!

So it’s all pretty simple when it comes to accessing elements and initializing arrays. But what happens if we want to create arrays?

numpy_array.py

import numpy as np

#create an array of all zeros
a = np.zeros((3,3))
print(a)
'''
[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
'''

#create an array of all ones
b = np.ones((2,3))
print(b)
'''
[[ 1.  1.  1.]
 [ 1.  1.  1.]]
'''

#create a constant array with a custom shape and custom value
c = np.full((4,3),2)
print(c)
'''
[[2 2 2]
 [2 2 2]
 [2 2 2]
 [2 2 2]]
'''

#create an identity matrix with a custom shape
d = np.eye(4)
print(d)
'''
[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [ 0.  0.  0.  1.]]
'''

#create random array
e = np.random.random((3,3))
#it might print something similar to somewhat below:
'''
[[ 0.59678947  0.89766843  0.04795142]
 [ 0.1575911   0.54953419  0.21916215]
 [ 0.69233153  0.99744842  0.89032515]]
'''

2.3. Indexing

We can manipulate arrays in many ways. One of them is called slicing. Let’s try to modify an array with modifying sliced sub-array only.

indexing.py

import numpy as np

#create an array
a = np.array([[1,2,3,4,5], [6,7,8,9,10],[11,12,13,14,15]])
print(a)
'''
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
'''

#Now let's obtain sub-array consisting of 3x3 array in the middle of the given array
b = a[:3, 1:4]
print(b)
'''
[[ 2  3  4]
 [ 7  8  9]
 [12 13 14]]
'''

#as we modify b array, we actually modify the given a array. Let's make all elements there equal zeros
for i in range(3):
   for j in range(3):
      b[i,j] = 0
print(b)
'''
[[0 0 0]
 [0 0 0]
 [0 0 0]]
'''

#now let's see how our first array changed:
print(a)
'''
[[ 1  0  0  0  5]
 [ 6  0  0  0 10]
 [11  0  0  0 15]]
'''

Now you may be wondering why did the given array change? If so, it’s a good question! Since a slice of an array is a view into the same data, so modifying it will change the original array. Makes sense, right?
So there is another thing which can be quite confusing. You may extract the very similar data but Python will say it’s different. Let me show you the code first, and then I will explain what is happening there:

row_col.py

import numpy as np

#create a 3x3 array
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
'''
[[1 2 3]
 [4 5 6]
 [7 8 9]]
'''

#example with rows
row_r1 = a[1, :]
row_r2 = a[1:2, :]

print(row_r1, row_r1.shape)
# [4 5 6] (3,)
print(row_r2, row_r2.shape)
# [[4 5 6]] (1, 3)

#example with columns
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]

print(col_r1, col_r1.shape)
#[2 5 8] (3,)
print(col_r2, col_r2.shape)
'''
[[2]
 [5]
 [8] (3, 1)
'''

Alright, time for the explanation! We can mix integer indexing with slices yields an array of a lower rank, while using only slices yields an array of the same rank as the original array. We can do the same distinction when accessing columns.
Anyway, that’s quite a useful trick. Also, we can mutate elements in arrays.

arrange.py

import numpy as np

#let's create 4x4 array
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]])
print(a)
'''
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]
'''

#let's create an array of indexes
b = np.array([0,1,2,3])
print(b)
#[0 1 2 3]

#for each row we extract the element of b's elements indexes
 print(a[np.arrange(4), b])
#[ 1  6 11 16]

'''
So we basically iterate the first row and look for 0th element (b[0] = 0) and it's 1 (a[0][0] = 1). Then we do the same for the rest of rows: (a[1][1] = 6, a[2][2] = 11, a[3][3] = 16
'''

#let's mutate some elements of the array:
a[np.arrange(4), b] = 0
print(a)
'''
[[ 0  2  3  4]
 [ 5  0  7  8]
 [ 9 10  0 12]
 [13 14 15  0]]
'''

Another cool feature is a boolean indexing. We can pick out arbitrary elements of an array. It is used to pick elements which satisfy some if-else statements.

bool.py

import numpy as np

a = np.array([[1,2,3], [4,5,6], [7,8,9]])

#let's find all elements that are greater than 5
bool_idx = (a>5)

#let's print them!
print(bool_idx)

#what if we want to know indexes of those elements? No problem!
print(a[bool_idx])

#we can actually do it altogether
print(a[a>5])

2.4. Datatypes

Do you know why NumPy is great? It tries to guess a datatype when you create an array, but functions that construct arrays also include (usually) an optional argument to explicitly specify the datatype.

datatype.py

import numpy as np

a = np.array([1,2])
print(a.dtype)
#int64

a = np.array([1.1, 2.2])
print(a.dtype)
#float64

a = np.array([1,2], dtype = np.int64)
print(a.dtype)
#int64

So note that we can actually clarify what kind of data we want to put in our variables. Here is a short list of what you can do:

    • "?" is a boolean
    • "b" is a signed byte
    • "B" is an unsigned byte
    • "i" is a signed integer
    • "u" is an unsigned integer
    • "f" is a floating-point
    • "c" is a complex-floating point
    • "m" is a timedelta
    • "M" is a datetime
    • "O" is an object
    • "U" is a unicode string
    • "V" is a raw data known as void

2.5. NumPy Math

You can perform mathematical functions on arrays. Let’s see how they work!

math.py

import numpy as np

a = np.array([[1,2],[3,4]], dtype=np.float64)
b = np.array([[5,6],[7,8]], dtype=np.float64)

print(a+b)
'''
[[  6.   8.]
 [ 10.  12.]]
'''
print(np.add(a,b))
'''
[[  6.   8.]
 [ 10.  12.]]
'''

print(a-b)
'''
[[-4. -4.]
 [-4. -4.]]
'''
print(np.subtract(a,b))
'''
[[-4. -4.]
 [-4. -4.]]
'''

print(a*b)
'''
[[  5.  12.]
 [ 21.  32.]]
'''
print(np.multiply(a,b))
'''
[[  5.  12.]
 [ 21.  32.]]
'''

print(a/b)
''' 
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
'''
print(np.divide(a,b))
''' 
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
'''

print(a**b)
'''
[[  1.00000000e+00   6.40000000e+01]
 [  2.18700000e+03   6.55360000e+04]]
'''
print(np.sqrt(a) + np.sqrt(b))
'''
[ 3.23606798  3.86370331]
 [ 4.37780212  4.82842712]]
'''

We can also work in NumPy with vectors. It seems obvious since we have been working with matrices for the past 10 minutes. Anyway, we can use dot function to compute inner products of vectors, to multiply matrices and to multiply vector by a matrix. Let’s see how it works!

product.py

import numpy as np

a = np.array([[1,2], [3,4]])
b = np.array([[5,6], [7,8]])

x = np.array([9,10])
y = np.array([11, 12])

#Vector/Vector product
print(v.dot(w))
#219
print(np.dot(v,w))
#219

#Matrix/Vector product
print(x.dot(v))
#[29 67]
print(np.dot(x, v))
#[29 67]

#Matrix/Matrix product
print(x.dot(y))
#
print(np.dot(x,y))
'''
[[19 22]
 [43 50]]
'''

Another commonly used thing is sum. Let’s see how it works!

sum.py

import numpy as np

 a = np.array([[1,2,3,4,5], [6,7,8,9,10], [11,12,13,14,15], [16,17,18,19,20]])

print(np.sum(a))
#210

print(np.sum(a, axis = 0))
#[34 38 42 46 50]
#it's a sum of each column above!

print(np.sum(a, axis = 1))
#[15 40 65 90]
#it's a sum of each row above!

Apart from computing and manipulating matrices, we can also transpose matrices. We just need to use T method. Let’s see how it works!

transpose.py

import numpy as np

a = np.array([[1,2,3,4,5], [6,7,8,9,10], [11,12,13,14,15], [16,17,18,19,20]])
print(a)
'''
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]]
'''
print(a.T)
'''
[[ 1  6 11 16]
 [ 2  7 12 17]
 [ 3  8 13 18]
 [ 4  9 14 19]
 [ 5 10 15 20]]
'''

3. Broadcasting

Broadcasting is something powerful that allows you to work with arrays of different shapes when performing arithmetic operations. Suppose we have a matrix and want to add a constant static vector to each row of it. We can do something like this (even though there are many other ways to do it):

broadcasting.py

import numpy as np

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])

#create an empty matrix with the same size as x
y = np.empty_like(x) 

for i in range(4):
   y[i, :] = x[i, :] + v

print(y)
'''
[[ 2 2 4]
 [ 5 5 7]
 [ 8 8 10]
 [ 11 11 13]]
'''

Basically what we do here is stacking multiple copies of vector vertically and performing elementwise summation. We can do it other way like this:

stack.py

import numpy as np

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1)) 

print(vv)
'''
[[ 1 0 1]
 [ 1 0 1]
 [ 1 0 1]
 [ 1 0 1]]
'''
y = x + vv
print(y)
'''
[[ 2 2 4]
 [ 5 5 7]
 [ 8 8 10]
 [ 11 11 13]]
'''

So what does broadcasting have to do with all this? Well, it actually helps you to perform computation without creating multiple copies of v. What about this code?

no_copies.py

import numpy as np

x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])

y = x + v
'''
This line works even though x has shape(4,3) and v has shape (3,) due to broadcasting
'''

print(y)
'''
[[ 2 2 4]
 [ 5 5 7]
 [ 8 8 10]
 [ 11 11 13]]
'''

There are couple things to remember when it comes to broadcasting 2 arrays together:

  • The arrays can be broadcast together if they are compatible in all dimensions
  • After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays
  • Arrays are compatible in a dimension if (and only if) they have the same size in the dimension, or if one of the arrays has size 1 in that dimension

4. Summary

For now, we have a basic understanding about NumPy and how to work with NumPy arrays. Let’s sum up what we have learned, and I will also provide links to official documentation:

  • You can create arrays and manipulate them, mutate elements inside and do many cool things! More info about creating arrays can be found here and more info about array manipulation is there
  • You can index objects in arrays in many different ways: basic slicing, field access, advanced indexing. More about indexing is up here
  • You can perform all sorts of mathematical miracles in NumPy. Actually, if you are using it and want to become a data scientist, maybe there are no miracles but calculations for you! Anyway, more info about mathematical functions is up here
  • You can do broadcasting which is very powerful and makes your code run faster. More info about that is here
  • For any general references it’s always wise to go to official docs to solve your problems. Surely, I should put a link to official docs which is here

5. Homework

It’s always good to practice what you have just learned. So I encourage you to solve these exercises.

  1. How to randomly place p elements in a 2D array?
  2. Find the nearest value from a given value in an array
  3. Considering a four dimensions array, how to get sum over the last two axis at once?
  4. How to swap two rows of an array?
  5. Compute a matrix rank

6. Download the Source Code

You can find all materials needed in the file below.

Downoload
You can download the full source code of this example here: python-numpy-array.zip

Aleksandr Krasnov

Aleksandr is passionate about teaching programming. His main interests are Neural Networks, Python and Web development. Hobbies are game development and translating. For the past year, he has been involved in different international projects as SEO and IT architect.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button