I'm writing a scientific application in Python with a very processor-intensive loop at its core. I would like to optimise this as far as possible, at minimum inconvenience to end users, who will probably use it as an uncompiled collection of Python scripts, and will be using Windows, Mac, and (mainly Ubuntu) Linux.
It is currently written in Python with a dash of NumPy, and I've included the code below.
Is there a solution which would be reasonably fast which would not require compilation? This would seem to be the easiest way to maintain platform-independence.
If using something like Pyrex, which does require compilation, is there an easy way to bundle many modules and have Python choose between them depending on detected OS and Python version? Is there an easy way to build the collection of modules without needing access to every system with every version of Python?
Does one method lend itself particularly to multi-processor optimisation?
(If you're interested, the loop is to calculate the magnetic field at a given point inside a crystal by adding together the contributions of a large number of nearby magnetic ions, treated as tiny bar magnets. Basically, a massive sum of these.)
# calculate_dipole
# -------------------------
# calculate_dipole works out the dipole field at a given point within the crystal unit cell
# ---
# INPUT
# mu = position at which to calculate the dipole field
# r_i = array of atomic positions
# mom_i = corresponding array of magnetic moments
# ---
# OUTPUT
# B = the B-field at this point
def calculate_dipole(mu, r_i, mom_i):
relative = mu - r_i
r_unit = unit_vectors(relative)
#4pi / mu0 (at the front of the dipole eqn)
A = 1e-7
#initalise dipole field
B = zeros(3,float)
for i in range(len(relative)):
#work out the dipole field and add it to the estimate so far
B += A*(3*dot(mom_i[i],r_unit[i])*r_unit[i] - mom_i[i]) / sqrt(dot(relative[i],relative[i]))**3
return B