NumPy is a fundamental package for data analysis in Python as the majority of other packages in the Python data eco-system build on it. Subsequently, it makes sense for us to have an understanding of what NumPy can help us with and its general principles.
In the following article, we’ll take a look at arrays in Python – which essentially take the ‘lists’ data type to a new level. We’ll have powerful new methods, random number generation and a way of storing data in grid-like structures, not just lists like we have seen.
Let’s get things started and import the numpy library. Take a read here if you need to install it!
import numpy as np
Creating a NumPy array
Firstly, we need to create our array. We have a number of different ways to do this.
One way is to convert a pre-existing list into an array. Below, we do this to create a 1d array (one line) and a 2d array (a grid, or matrix).
#Three lists, one for GK heights, one for GK weights, one for names GKNames = ["Kaller","Fradeel","Hayward","Honeyman"] GKHeights = [184,188,191,193] GKWeights = [81,85,103,99] #Create an array of names print(np.array(GKNames)) #Create a matrix of all three lists, start with a list of lists GKMatrix = [GKNames,GKHeights,GKWeights] print(np.array(GKMatrix))
['Kaller' 'Fradeel' 'Hayward' 'Honeyman'] [['Kaller' 'Fradeel' 'Hayward' 'Honeyman'] ['184' '188' '191' '193'] ['81' '85' '103' '99']]
There we have two examples of creating arrays from a list. Our second one is particularly cool – is just like a spreadsheet and will make our data much easier to deal with.
Aside from creating our own arrays from lists we already have, numpy can create them with its own methods:
#With 'arange', we can create arrays just like we created lists with 'range' #This gives us an array ranging from the numbers in the arguments np.arange(0,12)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
#Want a blank array? Create it full of zeros with 'zeros' #The argument within it create the shape of a 2d or 3d array np.zeros((3,11))
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
#Hate zeros? Why not use 'ones'?! np.ones((3,11))
array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
#Creating dummy data or need a random number? #randint and randn are useful here #Creates random numbers around a standard distribution from 0 #The argument gives us the array's shape print(np.random.randn(3,3)) #Creates random numbers between two numbers that we give it #The third argument gives us the shape of the array print(np.random.randint(1,100,(3,3)))
[[ 1.1403024 -1.76082025 -0.71738168] [-0.44740344 -0.16392845 1.04022957] [ 1.97068835 0.50075891 -0.33750378]] [[70 28 67] [19 54 11] [ 9 34 67]]
Looking for more ways to create arrays? Take a look in the documentation for ‘rand’, ‘linspace’, ‘eye’ and others!
Not only does NumPy give us a good way to store our data, it also gives us some great tools to simplify working with it.
Let’s find the tallest goalkeeper from our earlier examples with array methods.
#Three lists, one for GK heights, one for GK weights, one for names #Create an array with each list GKNames = ["Kaller","Fradeel","Hayward","Honeyman"] GKHeights = [184,188,191,193] GKWeights = [81,85,103,99] np.array(GKNames) GKHeights = np.array(GKHeights) np.array(GKWeights) #What is the largest height, .max()? GKHeights.max()
#What location is the max, .argmax()? GKHeights.argmax()
#Can I use this method to locate the player's name? #Instead of a number in the square brackets, I can just put this method GKNames[GKHeights.argmax()]
With only four players this is a bit long-winded, but I’m sure that you can see the benefit if we have a whole academy of players and we need to find our tallest player from 100s. Swap the max to min to find the smallest value in an array.
You are likely to use NumPy with all sorts of packages as you develop your Python skills. Having a healthy appreciation of how it works, especially with arrays, will save you lots of headaches down the line.
In this page, we saw how we can create them from scratch, or convert them from lists. We created flat, 1-d arrays and 2-d grids. We then applied methods to find highest datapoints and even used these to navigate our grid. Great work! Take a look at our extension on NumPy arrays here to learn more.