Tutorial 2: Numpy Arrays

Hello Archaeopy! We will be releasing our Geophysical challenge two next week! To get you ready for this challenge, here is the complementary tutorial covering numpy arrays. We have discussed numpy before in previous posts, but since it is the fundamental package for handling our data, we thought it would be important to cover the numpy basics once more. Much of this tutorial will be taken from the tutorial on the official numpy webpage. Check it out for further information.

Why numpy?

Numpy's main object is the multidimensional array. Consider geophysical data: we have positional coordinates (x and y) and at least one value (z). There are many ways we can handle this information (e.g. xyz file, grd file, as profiles). Numpy arrays are efficient for handling, storing, and manipulating large datasets.

Numpy array properties

Now, let's explore numpy array properties with a more hands-on approach. This will probably be easiest to do through an interpreter. Although you can create a new file and run it. We can input text in WordPress as Python format, which for sake of ease I will do. This will look like I am writing in a file, as opposed to an interpreter... but do whatever works for you. The standard convention is to import numpy as np. So, we will begin by typing that in.

import numpy as np

We used the loadtxt function in geophysical challenge 1 to import data into numpy arrays, but this time we are just going to make a simple one ourselves. This tutorial is not too concerned with the contents of the array (as the geophysical challenges will cover it), but on the array properties and array operations themselves. So we can create our own made up array.

our_array = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]

See what happens when you use the print statement on our_array. It should look like what you just typed in. Now, let's turn it into a numpy array!

our_array = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
numpy_array = np.array(our_array)

Use the print statement on numpy_array. Notice how it differs from our_array? Now, let's play around making and displaying other numpy arrays. What happens when we leave the 12 off or add in a letter?

our_array2 = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]
numpy_array2 = np.array(our_array2)
print numpy_array2

our_array3 = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, f]]
numpy_array3 = np.array(our_array3)
print numpy_array3

Doesn't look like our first numpy_array. Why? You can explore numpy array properties by typing in your numpy array name (e.g. our_array) followed by .shape (array dimensions) and .dtype (the data type describing the array elements). What are the different data types? Now try changing the data types. What are the different data types for numpy arrays? Try changing the shape of the array as well.

Operations

Generally we process our geophysical data. Numpy arrays are very useful to accomplish this because we can apply operations to the array and between arrays and scalars. We can apply basic arithmetic operations on numpy arrays and between numpy arrays:

3 * numpy_array
numpy_array + numpy_array

This is a very powerful tool because you can create and evaluate expressions using the arrays.

We can also compute data statistics. For example: minimum: np.min(). maximum: np.max(). mean: np.mean(). standard deviation: np.std(). Try these on your numpy array. (Statistical methods will come in handy later for future challenges/extra involving data processing and filtering).... Numpy also has a histogram function. Try computing and plotting a histogram of your array.

Indexing, Slicing, and Iterating

Array indexing is used to access certain elements or subsections of an array. For reference, here are the indexing elements for a two dimensional numpy array. This figure was taken from Wes McKinney's book "Python for Data Analysis" (copyright 2013. O'Reilly Media Inc: Sebastopol). Consider each block to contain one array element. This is how we can reference back to each array element: Referencing back to our numpy array, if you enter array[1, :], it returns array([5, 6, 7, 8)], row one of our array (***remember, array indexing begins at zero). If you enter array [1, 0:3], it returns array([5, 6, 7)]--elements in rows 0 through 3 in row one.

Besides slicing, we can also reference array elements through iteration: This is a very basic example. For a more detailed overview of iterating over arrays, check out this SciPy document. This document uses np.nditer to provide a more flexible approach.

This ends the overview of numpy arrays.... enough to get you along for the geophysical challenge number two. Send us a comment, tweet, or facebook post if you have any more questions or comments. Also please have a gaze through SciPy's numpy tutorial.