Surviving MATLAB and R as a Hardcore Programmer
- by dsimcha
I love programming in languages that seem geared towards hardcore programmers. (My favorites are Python and D.) MATLAB is geared towards engineers and R is geared towards statisticians, and it seems like these languages were designed by people who aren't hardcore programmers and don't think like hardcore programmers. I always find them somewhat awkward to use, and to some extent I can't put my finger on why. Here are some issues I have managed to identify:
(Both): The extreme emphasis on vectors and matrices to the extent that there are no true primitives.
(Both): The difficulty of basic string manipulation.
(Both): Lack of or awkwardness in support for basic data structures like hash tables and "real", i.e. type-parametric and nestable, arrays.
(Both): They're really, really slow even by interpreted language standards, unless you bend over backwards to vectorize your code.
(Both): They seem to not be designed to interact with the outside world. For example, both are fairly bulky programs that take a while to launch and seem to not be designed to make simple text filter programs easy to write. Furthermore, the lack of good string processing makes file I/O in anything but very standard forms near impossible.
(Both): Object orientation seems to have a very bolted-on feel. Yes, you can do it, but it doesn't feel much more idiomatic than OO in C.
(Both): No obvious, simple way to get a reference type. No pointers or class references. For example, I have no idea how you roll your own linked list in either of these languages.
(MATLAB): You can't put multiple top level functions in a single file, encouraging very long functions and cut-and-paste coding.
(MATLAB): Integers apparently don't exist as a first class type.
(R): The basic builtin data structures seem way too high level and poorly documented, and never seem to do quite what I expect given my experience with similar but lower level data structures.
(R): The documentation is spread all over the place and virtually impossible to browse or search. Even D, which is often knocked for bad documentation and is still fairly alpha-ish, is substantially better as far as I can tell.
(R): At least as far as I'm aware, there's no good IDE for it. Again, even D, a fairly alpha-ish language with a small community, does better.
In general, I also feel like MATLAB and R could be easily replaced by plain old libraries in more general-purpose langauges, if sufficiently comprehensive libraries existed. This is especially true in newer general purpose languages that include lots of features for library writers.
Why do R and MATLAB seem so weird to me? Are there any other major issues that you've noticed that may make these languages come off as strange to hardcore programmers? When their use is necessary, what are some good survival tips?
Edit: I'm seeing one issue from some of the answers I've gotten. I have a strong personal preference, when I analyze data, to have one script that incorporates the whole pipeline. This implies that a general purpose language needs to be used. I hate having to write a script to "clean up" the data and spit it out, then another to read it back in a completely different environment, etc. I find the friction of using MATLAB/R for some of my work and a completely different language with a completely different address space and way of thinking for the rest to be a huge source of friction. Furthermore, I know there are glue layers that exist, but they always seem to be horribly complicated and a source of friction.