Monthly Archives: June 2013

Reading, Doing Something, Saving

Following on from my guide to Python for Archaeologists this is an introduction to dealing with Data in Python.

@GirlWithTrowel posted some documents about her Geophysics Data Processing steps here including some possible programming solution steps to make life easier.

Initially we'll do much of this through Command Line Programs rather than creating a pretty user interface.

Reading and Printing

Last time we printed 'Hello World'. This time we'll do something a little more useful.

The Code and data for this is available here

Program to Open a file and print the contents line by line
#Sets the path and Filename that you want to Open. 
filename = 'data/MultipleLineText.txt'

#Opens the file defined by filename, 'r' refers to the file as being opened to read
f = open(filename, 'r')
# *Loop* For each Line in the file 'f' does whatever is inside the loop
for line in f:
    #The indentation means were inside the loop
    #Prints the line
    print line
#Closes the file 'f'

I'm hoping the comments are detailed enough to explain what this code does

Reading and Doing Something

Suppose we have some X, Y, C1, C2 data in a comma delimited text file with a Header line and we want to calculate the mean, min, max and standard deviation.

This gives us the option to try out modules. The code and data is available here

Program to Open a CSV file, calculate some statistics, Deduct the mean and save the output
#Imports the NumPY library. Giving access to lots of fast numerical librarys
import numpy as np

#defines a module that you can call from within your program
def stats(data):
    #Uses Numpy to calculate statistics of data
    mn = np.min(data)
    mx = np.max(data)
    mean = np.mean(data)
    sd = np.std(data)

#Sets the path and Filename that you want to Open. 
filename = 'data/randcsv.txt'

#Determines the Header information by reading the first line of the file
with open(filename, 'r') as f:
  header = f.readline()

#Uses Numpy to load the entire file and split the data into an array
f = np.loadtxt(filename,skiprows=1,delimiter=',')

#Extracts the 3rd collumn from array f, if this is confusing collumn 1 is referred to as 0 in python
c1 = f[:,2]
#Extracts the 4th collumn from array f
c2 = f[:,3]

#Passes data to our stats module and recieves min,max,mean and standard deviation
c1_stats = stats(c1)
c2_stats = stats(c2)

print 'statistics for column 1 are', c1_stats
print 'statistics for column 2 are', c2_stats

#Deducts the Mean from the collumns of data
c1 = c1 - c1_stats[2]
c2 = c2 - c2_stats[2]

#Returns the zeromean values into the array
f[:,2] = c1
f[:,3] = c2

#Uses Numpy to save updated array  with same header as csv .txt
np.savetxt('data/randvscout.txt', f, delimiter=',', header=header, fmt='%.2f')

As above i'm hoping these examples are suitably commented to understand whats going on. I encourage you to modify this code and make it do more exciting things more quickly.

Now i've given gone through enough examples to introduce the basics i'm going to start trying to make some modules and code that people can use to properly process geophysical datasets.

Getting started with Python for Archaeology

The 10th International conference on Archaeological Prospection in Vienna involved a small but significant discussion about Open Software. This has continued on Twitter and I've committed myself to helping people start writing OpenSource software in Python.

I've chosen the Python Programming Language, partly because its one i have a working knowledge of but more importantly because it's inherently user readable. We have to remember that primarily we're Scientists, Archaeologists & Geophysicists, not Programmers and by using Python we can concentrate on what we want to happen to our data rather than how to tell the computer what you want to happen to the data.

Installing Python

To Get started with Python you'll need an implementation of Python on your computer. Python is Cross-Platform (It will work on windows, linux, mac ...) but the method of installing it on different platforms can be very different.

I've traditionally used PythonXY and Spyder on Windows and Mac Systems. For the purposes of this introduction i'm going to recommend Anoconda because its entirely Cross-Platform and therefore we should all encounter the same issues at the same time.

The installation method is slightly different between different operating systems but the Anaconda Support documents are detailed enough, i think.

If you dont want to use Anaconda PythonXY or installing Spyder and Python from scratch are suitable alternatives.

It's only during the Writing of this post that i've encountered Anaconda. This was very easy to install on my mac and includes 64Bit support unlike PythonXY. I'm going to try using both Anaconda and PythonXY to see if they both work seamlessly..

Keeping Track of Code Changes

I found one of the hardest and most frustrating things when i started writing code to process my Geophysical data was i changed it so much from day to day it was impossible to know how i'd created a particular dataset. I should have used a revision control system from the start but didn't so i'm going to recommend you do.

I've chosen to use EasyMercurial because its the one favoured by SoftwareCarpentry and completely cross platform, some instructions are available here.

Getting started with Spyder

To start writing some code open up Spyder:


Opening Spyder on Mac OSX


Opening Spyder on Windows







I Like to think Spyder is relatively simple but to the Non-Programmer it probably makes about as much sense as digging does to me. Hence the image below...

Using Spyder

This looks the same on whatever Operating System your running and you'll probably find i flick between Mac, Windows and Ipython.

Writing Some Code

Well we've done the boring stuff and got everything we need installed. Now to write some code.

Create a New file in Spyder and you should see something similar to this:

# -*- coding: utf-8 -*-
Created on Sun Jun  9 15:53:31 2013

@author: popefinn

the """ and # denote that the text is a comment and should not be 'run' as code. On a new line underneath the MetaData Comment section type

print ("Hello World")

Run the Code by pressing F5, save the file as something sensible within your EasyMercurial repository, and accept the default runtime options. You should see the following:



You've just written your first Python program. Its not particularly useful but is good step along the way.

I'm going to write another post soon with more useful python programming (reading, doing something, saving data) and start to move my tools and libraries into ArchaeoPY. For now i'd recommend looking at the information and tutorials on SoftwareCartpentry