2006-06-01
Abstract
In April 2006 a virus appeared for a new virusable platform - the general-purpose, mathematics-oriented MatLab. Vesselin Bontchev provides us with the full details of the unremarkable and slightly buggy proof-of-concept virus MLab/Balogy.A.
Copyright © 2006 Virus Bulletin
On 22 April 2006, Finnish anti-virus researcher Mikko Hyppönen reported that F-Secure had received the first virus for a new virusable platform (see http://www.f-secure.com/weblog/#00000859). The platform for which the virus is written is MatLab, made by MathWorks, Inc. (see http://www.mathworks.com/products/matlab/).
MatLab is a general-purpose, mathematics-oriented platform that can be used for various computations. Since mathematics is used pretty much everywhere, the applications of the product are numerous.
I happened to have easy access to the program because my mother – a professor at the Institute of Mechanics and Biomechanics in the Bulgarian Academy of Sciences – uses it for automated preprocessing of the output of her favourite CAD/CAE product. However, MatLab can be used for just about anything that involves computation: education, mathematics research, physics, statistics and even stock portfolio management.
The product is programmed in a proprietary language, which is vaguely C- or Pascal-like. I couldn't find an official name for the language in the product's documentation, but its users often refer to it as 'MatScript'. The programs written in this language are stored in files with the extension .M – MatLab calls them 'M-files'. The files are ASCII text files and can be opened with Notepad (although MatLab has a built-in editor/debugger for them).
The language is universal and powerful – not only does it have computationally oriented functions, but also a full set of file and string manipulation functions. Powerful enough to write a virus in it, that is. Which is precisely what has been done.
As a result, the members of CARO had to come up with a name for the platform the virus infects. After some discussions (during which, regrettably, the proposal for using 'MS', as in 'MatScript', was rejected due to its similarities with the abbreviation of Microsoft), we eventually decided to use, respectively, 'MatLabScript' and 'MLab' for the long and the short forms of the platform name. The document describing the CARO naming scheme has been updated accordingly (see http://www.people.frisk-software.com/~bontchev/papers/naming.html).
Next, we needed a family name for the virus. Apparently its author wanted it to be named 'MatLab.Bagoly.a', as is evident from the comment lines at the beginning of the virus body:
%--------------------------- % MatLab .m file infector by Positron (MatLab.Bagoly.a) %---------------------------
However, the members of CARO are not in the business of gratifying the egos of virus writers, so we decided to use the slightly distorted name 'Balogy' instead. (As former Virus Bulletin editor Nick FitzGerald pointed out, this sounds a bit like 'baloney', which pretty much reflects our opinion on the appearance of viruses for yet another virusable platform.)
So, the full CARO name of the virus is:
virus://MatLabScript/Balogy.A
Like most proof-of-concept viruses, this virus is remarkably mediocre, full of stupidities, and has virtually no chance of spreading in the wild.
The virus can be classified as a parasitic prepender. That is, it infects other M-files by inserting its own text at the very beginning of their contents. Files that contain the string '__EndSignature__' on any line are considered already infected and are left alone. The virus has the following as its last line for self-recognition purposes:
e__ = ‘__EndSignature__’;
The virus works using a very simple and straightforward algorithm. It starts by opening the file from which it has been executed (MatScript has a built-in variable that returns the name of the currently running file) and by reading its content, line by line, until the 'end signature' is found. Each line is stored in an element of a string array.
Then the virus inspects all *.M files in the current directory. The content of each file found is read (once again on a line-by-line basis) into another string array. If, during this reading, the 'end signature' is found anywhere on a line, the file is considered already infected and will not be touched any further.
Otherwise, the file is opened for writing and the virus writes into it the virus body (stored in the first string array) and then the original content of the file (stored in the second string array), after which the file is closed.
That's it – the virus only replicates, and only in the current directory. It has no payload whatsoever. Yet, despite the simplicity of the algorithm, the virus author has managed to make several logical and strategic mistakes.
First, the virus contains two instances of the following line:
if tline__ == -1, break, end
The purpose of this line is to determine that the end of the file has been reached (first when reading the virus body and again when reading the original content of the file that is going to be infected). However, at least under MatLab version 6.1.0.450 (R12.1), this line generates the following warning once per file:
Warning: Future versions will return empty for empty == scalar comparisons.
This means that each time the virus runs, the user will be 'rewarded' with 2(N+1) such warnings, where N is the number of M-files in the current directory. (N files, plus one file from which the virus is running, and two warnings per file because there are two instances of the line that causes the warning.) That's hardly unnoticeable.
Second, when determining whether a file is already infected, the virus continues to read it line by line until the very end – even after it has determined that the file is already infected and will have to be ignored. That's hardly wise.
Third, it is obvious that the first time the virus is run, it will infect all the files in the current directory. Why, then, try to do it again the next time and every time it is run? Hoping that somebody has added a new file meanwhile? That's hardly intelligent.
Fourth, the virus does not attempt to spread outside the current directory – i.e. to other directories and/or machines – despite the fact that MatScript does have the means to achieve such goals (see the next section). So, the only way in which another user can become infected with the virus is if the current directory is a shared one (e.g. on a network server), or if somebody passes them an infected M-file. That's hardly efficient.
Finally, when the virus writes to the target file, it uses '\n' as a line separator. In MatScript, this results in the lines being separated only with an LF (line feed) character. MatScript (and its built-in editor) can handle both lines that are separated only with LF characters and lines that are separated with CR/LF sequences (i.e. both Unix-style and DOS-style line endings). However, Notepad messes up when trying to display text files whose lines are separated only with LF characters. So, although the infected files will work in a sense, they will look messed up to the user who tries to edit them with Notepad or a similar unintelligent editor.
While MLab/Balogy.A is a very simplistic (and buggy) virus with virtually no chance of spreading in the real world, MatLab has sufficiently powerful capabilities to at least create significant annoyances for users and anti-virus researchers alike.
Indeed, MatLab doesn't seem to have the concept of 'autostart' script – so, a virus written in MatScript cannot hope to receive control automatically each time the product is executed. However, MatLab does have the concept of a 'search PATH', which means that various kinds of companion viruses are possible.
The simplest of all kinds of companion viruses are the renaming companions. A virus could store its body in a file with a name already present in the system, while renaming the original file to something else and executing it directly after the virus has finished running. Even the names of the internal MatLab commands can be 'overloaded' with M-files, which is almost as good as having an ‘autostart’ capability (e.g. by overloading the name of some often-used command like 'help' or 'edit').
Next, we can have PATH companions. By default, when the name of an M-file is typed at the command line, MatLab looks for a file with such a name in each of the directories on the subtree where the product is installed. However, MatScript has commands that allow the user to manipulate the search path by adding or removing arbitrary directories to/from it. Clearly, this can be exploited by a malware author.
Finally, a special kind of companionship is possible. If the files C:\Foo\bar.m and C:\Foo\private\bar.m both exist and the command 'bar' is typed from MatLab's command line, MatLab will try to execute the second one, without the directory 'C:\Foo\Private' having to be present explicitly in the search path.
Of course, in addition to companion and parasitic viruses (both prepending and appending, although, due to some conventions about what the M-files are supposed to contain, the prependers are 'easier'), the language also allows overwriting viruses to be written – although these are extremely noticeable (because the infected files stop working), and not very interesting.
However, it is also perfectly possible to write a LoveLetter-style mass-mailer in MatScript. MatLab associates the *.M extension with itself, so if the user receives a file with such an extension as an email attachment and double-clicks on it, MatLab will be launched and will try to execute the content of the double-clicked M-file.
When an M-file is first executed by MatLab, a pre-parsed form of it is kept in memory until the end of the MatLab session (or until purged explicitly from there with the proper command). This is done for speed reasons – later invocations of that file will result in MatLab running the pre-parsed memory image instead of trying to read and parse the original file again. This alone has some interesting implications in respect of self-modifying malware written in MatScript. However, it gets worse.
MatLab can save these pre-parsed memory images in files with the extension .P – and can execute them just like the M-files. In addition, if the files Foo.m and Foo.p both exist, and the command 'foo' is entered from MatLab's command line, it will be the second file that will be executed; not the first one – which allows for yet another kind of companionship infection.
Even worse, while the M-files are ASCII text, the 'P-files' are binary files with – you've guessed it – completely undocumented format. At present we don't even know whether their content is constant or whether they contain areas with variable content (e.g. like VBA).
MLab/Balogy.A is a relatively unremarkable and slightly buggy proof-of-concept virus for a new virusable platform. It poses no threat by itself, since it has virtually no chance of spreading in the real world. However, the capabilities of the platform are powerful enough and have the potential to cause some annoyance both to users and to anti-virus researchers.
Virus name. virus://MatLabScript/Balogy.A
Aliases. MLS/Lagob, Mlab.Lagob.
Type. Parasitic prepender.
Infects. MatLab 'M-files'.
Payload. None.