Occasionally there is the 1% of code that is computationally intensive enough that needs the heaviest kind of low-level optimization. Examples are video processing, image processing, and all kinds of signal processing, in general.
The goals are to document, and to teach the optimization techniques, so that the code does not become unmaintainable and prone to removal by newer developers. (*)
(*) Notwithstanding the possibility that the particular optimization is completely useless in some unforeseeable future CPUs, such that the code will be deleted anyway.
Considering that software offerings (commercial or open-source) retain their competitive advantage by having the fastest code and making use of the newest CPU architecture, software writers often need to tweak their code to make it run faster while getting the same output for a certain task, whlist tolerating a small amount of rounding errors.
Typically, a software writer can keep many versions of a function as a documentation of each optimization / algorithm rewrite that takes place. How does one make these versions available for others to study their optimization techniques?