Evil MSI Background BASE64 Statistical Analysis

Posted Jun 15, 2026

By ictsec.pl

2 min read

Evil MSI Background: BASE64 Statistical Analysis

A recent analysis focused on a suspicious JPEG file, following up on “The Evil MSI Background is Back!”. The bytes present in this suspicious JPEG file were examined using the byte-stats.py tool. The results showed that almost half of the content (45.65%) is BASE64 characters, and the longest BASE64 string is 1000 characters. Additionally, the longest string is almost 1 million characters long. However, using base64dump.py, the longest BASE64 string found was indeed 1000 characters long but didn’t seem to decode to something recognizable. This indicated that a special encoding must have been used. Further attempts using base64dump.py revealed long BASE85 encoded strings, but still no string close to 1 million characters, leading to the conclusion that this must be a custom encoding.

To try to figure out what custom encoding is used, a --stats option was added to base64dump.py. This new feature provided character statistics, showing that all BASE64 characters appear in the detected BASE64 strings, but that the letter A appears significantly less than other letters. If a minimum length for the detected BASE64 strings was used, the letter A was even missing. Notice that the = character is also missing, but the = character is a padding character in BASE64, not a normal character: it can only appear once or twice at the end of a BASE64 string. So this statistics feature of base64dump.py helps us to detect that we might be dealing with a custom encoding, based on BASE64, where the letter A has been replaced with another character. Character # was identified as the most frequent, leading to the hypothesis that A had been replaced with #, but this attempt was unsuccessful.

The discrepancy where a very long BASE64 string, almost 1 million characters long, was identified by byte-stats.py but not base64dump.py was due to byte-stats.py looking for longest strings of consecutive BASE64 characters without checking if that string length is a multiple of 4 (a requirement for BASE64), while base64dump.py does check this. Upon closer inspection of the long string, it was noticed that the string had been reversed: == appears at the beginning, and not at the end. And the end is …qVT, which is TVq reversed, and that’s a marker for MZ, e.g., a Windows executable. Reversing the encoded payload confirmed that it was indeed a PE file, and it had the same hash as the file Xavier extracted. This new feature of base64dump.py, --stats, can help with the reversing of custom encodings by providing statistics of the encoding characters.

Read full article

CYBERSECURITY

This post is licensed under CC BY 4.0 by the author.

Evil MSI Background: BASE64 Statistical Analysis

Trending Tags