Masaki Suenaga Symantec
Most viruses 'speak' in English, and English-'speaking' mass-mailing worms tend to spread worldwide. Virus analysts generally can also understand English. As a result, we can tell our customers about what kind of message is sent or what is targeted.
Even if the viruses speak in French or Portuguese, we might be able to extract the correct text from them. But there is quite a bit of room for error when extracting Portuguese words.
If the text is not written in the West-European code page, however, we have to guess which code page was used. If we fail, we will get nothing, and therefore cannot provide the same level of precise information to customers as we could if it were English text.
Encrypted English strings can be decrypted technically. Natural languages might look like hieroglyphics to those unfamiliar with the language. Machine translation is widely used nowadays and can be very useful when we know the correct strings and what language is used. The question is, how do we determine these? This paper will provide some tips.