Indexing NumPy Arrays

In the Arrays intro, you probably noticed an example where we used square brackets after an array to select a specific part of the array. In this article, we will see how we can identify and select parts of our arrays, whether 1d or 2d.

Let’s get started by importing our NumPy module and setting up an array of World Cup years. We’ll do this by calling ‘arange’ for every 4 years, then using ‘np.delete’ – a numpy function to remove parts of an array – to remove 1942 & 1946 (these are in locations 3 & 4).

In [1]:
import numpy as np

#Every 4 years since 1930
WCYears = np.arange(1930,2018,4)

#No World Cup in 1942 or 1946
WCYears = np.delete(WCYears,(3,4))

WCYears
Out[1]:
array([1930, 1934, 1938, 1950, 1954, 1958, 1962, 1966, 1970, 1974, 1978,
       1982, 1986, 1990, 1994, 1998, 2002, 2006, 2010, 2014])

Bracket selection

Following an array with square brackets is the easiest way to select an individual value or range.

Two important things to remember that will be second nature to you soon:

1) Any range includes the first number, but not the final one.
2) Indexes begin at 0 in Python.

In [2]:
#What year was the third World Cup held?
WCYears[2]
Out[2]:
1938
In [3]:
#Show me the 4 World Cup years following WW2
WCYears[3:7]
Out[3]:
array([1950, 1954, 1958, 1962])

Bracket selection allows you to make changes to any of these figures, just like you would do with a variable. Be careful, though, as you cannot undo this and will have to go several steps back!

Selections in a 2d array (grid)

Bracket selection is also used to make selections on a grid. We have two options to do so:

grid[row][column] OR grid[row,column]

Both are essentially the same, so use whatever works for you and be aware that you may see it differently elsewhere!

In [4]:
#Create our 2d array
WCYears = [2002,2006,2010,2014]
WCHosts = ["Japan/Korea","Germany","South Africa","Brazil"]
WCWinners = ["Brazil","Italy","Spain","Germany"]

WCArray = np.array((WCYears,WCHosts,WCWinners))
WCArray
Out[4]:
array([['2002', '2006', '2010', '2014'],
       ['Japan/Korea', 'Germany', 'South Africa', 'Brazil'],
       ['Brazil', 'Italy', 'Spain', 'Germany']],
      dtype='<U12')
In [5]:
#2010 is the third year, find the host in the second row
WCArray[1,2]
Out[5]:
'South Africa'
In [6]:
#Find the winner of the last World Cup
#Negative selection!

WCArray[2,-1]
Out[6]:
'Germany'

Selecting parts of an array with criteria

So far, we have only selected values when we know their location. Quite often, we won’t know where things are, or will want to find something completely new.

NumPy allows us to select based on criteria that we give it. We will give it a test and if numbers return as ‘True’, then it will give them to us.

In [7]:
WCYears = np.array([1966,1970,1974,1978])
WCTopScorers = np.array(["Eusebio","Muller","Lato","Kempes"])
WCGoals = np.array([9,10,7,6])
In [8]:
#Where does the top scorer score more than 8 goals?
WCGoals > 8
Out[8]:
array([ True,  True, False, False], dtype=bool)
In [9]:
#Not particularly useful, but we can then use bracket selection with this!

WCTopScorers[(WCGoals>8)]
Out[9]:
array(['Eusebio', 'Muller'],
      dtype='<U7')

As you can see, the first query (‘WCGoals>8′) returns an array of True or False. We then plug this into another array, which will return only the locations that are True. This allows us to get the scorers’ names, not just a True or False. This is useful with small arrays, but will be a massive help when we deal with bigger datasets.

Summary

Selecting values in either a 1d or 2d array is really easy. If we know where we want to look, we have square brackets containing index numbers – one number for a 1d array, or 2 numbers for row/column in a grid.

However, we do not just have to pass in the index, we can pass in an array of True or False values that allow us to filter based on a criteria.