Following on from my guide to Python for Archaeologists this is an introduction to dealing with Data in Python.
@GirlWithTrowel posted some documents about her Geophysics Data Processing steps here including some possible programming solution steps to make life easier.
Initially we'll do much of this through Command Line Programs rather than creating a pretty user interface.
Reading and Printing
Last time we printed 'Hello World'. This time we'll do something a little more useful.
The Code and data for this is available here
""" Program to Open a file and print the contents line by line """ #Sets the path and Filename that you want to Open. filename = 'data/MultipleLineText.txt' #Opens the file defined by filename, 'r' refers to the file as being opened to read f = open(filename, 'r') # *Loop* For each Line in the file 'f' does whatever is inside the loop for line in f: #The indentation means were inside the loop #Prints the line print line #Closes the file 'f' f.close()
I'm hoping the comments are detailed enough to explain what this code does
Reading and Doing Something
Suppose we have some X, Y, C1, C2 data in a comma delimited text file with a Header line and we want to calculate the mean, min, max and standard deviation.
This gives us the option to try out modules. The code and data is available here
""" Program to Open a CSV file, calculate some statistics, Deduct the mean and save the output """ #Imports the NumPY library. Giving access to lots of fast numerical librarys import numpy as np #defines a module that you can call from within your program def stats(data): #Uses Numpy to calculate statistics of data mn = np.min(data) mx = np.max(data) mean = np.mean(data) sd = np.std(data) return(mn,mx,mean,sd) #Sets the path and Filename that you want to Open. filename = 'data/randcsv.txt' #Determines the Header information by reading the first line of the file with open(filename, 'r') as f: header = f.readline() #Uses Numpy to load the entire file and split the data into an array f = np.loadtxt(filename,skiprows=1,delimiter=',') #Extracts the 3rd collumn from array f, if this is confusing collumn 1 is referred to as 0 in python c1 = f[:,2] #Extracts the 4th collumn from array f c2 = f[:,3] #Passes data to our stats module and recieves min,max,mean and standard deviation c1_stats = stats(c1) c2_stats = stats(c2) print 'statistics for column 1 are', c1_stats print 'statistics for column 2 are', c2_stats #Deducts the Mean from the collumns of data c1 = c1 - c1_stats c2 = c2 - c2_stats #Returns the zeromean values into the array f[:,2] = c1 f[:,3] = c2 #Uses Numpy to save updated array with same header as csv .txt np.savetxt('data/randvscout.txt', f, delimiter=',', header=header, fmt='%.2f')
As above i'm hoping these examples are suitably commented to understand whats going on. I encourage you to modify this code and make it do more exciting things more quickly.
Now i've given gone through enough examples to introduce the basics i'm going to start trying to make some modules and code that people can use to properly process geophysical datasets.