Getting Started¶
Using Python and other open-source tools are great only if one gets acquainted with the ecosystem sufficiently. Otherwise, usually an online search frenzy and then the frustration is around the corner.
Hence, before we move to the harold documentation, maybe a general information about the related concepts can soften the fall.
Python and its scientific stack¶
For the newcomer, a slightly distorted story of the tools required to work with harold might save some headaches. To emphasize, the story is roughly correct but precisely informative !
Since the late 70s, there has been an enormous effort went into the numerical computing. For reasons that are really not relevant now, Fortran (yes that old weirdo language) is still considered to be one of the fastest if not the fastest runtime performance platform. On top of this performance, people with great numerical algebra expertise had been optimizing these implementations since then.
The result is now known as BLAS, LAPACK and other key libraries optimized to the bone with almost literally manipulating the memory addresses manually. And what happens is that many commercial software suites actually somehow find a way to utilize this performance even though they are not coding in Fortran directly. You have to appreciate how generous the authors were and why I’m humbly replicating their style with the MIT license.
Hence, without knowingly, you have been mostly using compiled Fortran code with various front ends including matlab and other software. However, this high performance doesn’t come for free. The price to pay is to be extremely verse with low-level operations to benefit from these tools. Hence, every language/software somehow designs a front end that communicates with the user, understands the context, say a matrix turns out to be triangular, then prepares the data in a very strict format and sends to these libraries. In turn, picks up the result, shuffles the output format and converts it to something that the user can utilize further.
In Python, this front end is called the scientific stack, that is NumPy and SciPy. These involve C and Fortran bindings to low-level tools and wrap them with the typical easy-to-use Python syntax.
That’s why, if you are not the faint-hearted user and installing everything by yourself, you have to provide a compiler that is capable of providing the compiled versions of these libraries. You might have noticed that some sources ship precompiled flavors of NumPy, SciPy to ease the pain of building the libraries from scratch. While being very useful, if the compiler does not like your particular system, the result is often a glorious crash. Hence, it is more of an art rather than following a manual
Note
The compiler problem is almost-universally a Windows problem since it doesn’t come with a proper compiler because microsoft being microsoft. The unfortunate users (including me) who have no idea even what a compiler is, have to find a compiler that crashes depending on the weather conditions of the installation date.
For this reason, some groups have come up with precompiled and matched version packages that are ready to be installed without bothering the users such as Anaconda, pythonxy etc. See the installation page of Scientific stack and cross your fingers.
Note
Python is also mostly written in C hence the story is more involved but for the math oriented users it is not relevant at this point.
Python and its strange syntax¶
Python did not start its life as a programming language involving a mathematics
module. It actually took quite some time to reach to the current situation that
is arguably dominated the data science and machine learning fields. Consequently,
most of the common english keyboard characters are reserved for other programming
languages. Even worse, most of the brackets and delimiters are used for internal
structures. For example, a matlab native square brackets [...]
are used to
create objects that go by the name lists. Hence, the scientific stack creators
had to come up with more verbose and kind of annoying syntax compared to the
native syntax of the well-known scientific computing software suites. As an
example, the moment you open up a Python console, you need to import the NumPy
module via import numpy
and then obtain the possibility of array manipulations,
math functions:
a = numpy.array([1,2,3,4])
b = numpy.atleast_2d(a)
c = a.reshape(4,1)
d = numpy.array([[1],[2],[3],[4]]) # creates a tall vector
Note
The situation is slightly more complicated than this because what first above command creates is a one dimensional array. That is to say, matlab users will be baffled when they try to transpose it as it will spit out exactly the same array but not the tall vector. The reason for this is that the array is theoretically have only one dimension hence transpose is not well-defined as there is no second dimension to swap the elements. There are of course ways to handle this but none is fun.
Note
There is also a limited math library in Python and that is not to be confused with the scientific library of Python.
It was actually 2014 that Python developers finally(!!) decided to reserve
a symbol for matrix multiplication that is the @
character (again almost
everything else is reserved and core Python developers don’t care that much
about math stuff and the regular *
is meant for element-wise multiplication
). The actual matrix multiplication is actually a function
with a surprisingly accurate name : a.dot(b)
. However, this becomes über-tedious
if a chain of matrices are multiplied, especially when we don’t have the
amazing (I mean it) backslash operator of matlab via A\B*(C+D*E)*F
, e.g.,
x = numpy.linalg.solve(A,B).dot(C+D.dot(E)).dot(F)
Warning
Never, ever, never use inv()
in your computations on any software
(unless you want to see what the inverse is explicitly). You have been warned
with a colored box.
But in turn, probably, you get the most sought after operation: continuous slicing and functional chains (I don’t know what nerds call this). What I mean: imagine you want to slice a matrix then do something with another slice of another matrix and index the result .... and so on.
m = A[:4,:3].dot(B[2:5,:])[3,2].dot(C)
This example can be arbitrarily long and complicated so don’t get fooled by counter arguments: there is no substitute for this and once you get the hang of it, it pays off. For some reason, NumPy/SciPy/IPython prefer the mathematica syntax for linear algebra objects and I have no idea why. Arguably, Mathematica is the least matrix algebra friendly software suite but anyway. There is no decision to be made here unless we recode the array parsing parts all from scratch.
The indexing of the arrays start from zero and not one !! This debate is stupid and I don’t care. I recommend you to do so. Computer scientists are not the best people to design front-ends. You wouldn’t ask the owner of the Facebook to meet your best friends now, do you? It would be a guaranteed disaster.
Moreover, the array indexing is semi-exclusive, in math notation whatever is given is used as [a,b[. In english, this notation [1,5] is meant to be read as from one up to five, excluding five. It sometimes makes things extremely natural and convenient, sometimes you might consider watching the paint dry instead of debugging the mismatched array sizes.
Another point that might annoy the users is the complex number syntax. In Python,
you are obliged to use the letter j
for the complex unit but cannot i
.
Hopefully, this would convince you that Python framework is not yet another matlab clone. I cannot do justice to demonstrate all the nuances since not only I don’t have the resources but also some of the stuff still don’t make too much sense to me. But in defense of Python, most of the stuff in matlab never made sense to me either. So it is a major upgrade.
An almost exhaustive cheat sheet for recovering matlab users¶
The following link is actually one of the first hits on any search engine but here it is for completeness. Please have some time spared to check out the differences between numpy and matlab syntax. It might even teach you some matlab too.
Click here for “Numpy for matlab users”
Now assuming that you have mastered the art of finding your way through gazillion of blogs, filtering StackOverflow nerd anger, decrypting the documentation of Numpy, let’s start doing some familiar things in harold.