Saturday, October 18, 2014

befoRe anna paVlOvNa And the otHers HaD timE tO smile their appreCiAtion of tHe ViCoMte's EpigRaM, Pierre aGaIn BrokE iNtO the CoNvErsation, AnD thoUgh Anna Pavlovna fElT sUre he wouLd Say sOmEthiNg InapPropRiate, she Was uNable tO sTop hIm. "ThE execution of thE DUc d'ENgHien," deClarEd monsIeur PierRe, "WaS a PoliTiCal nEcessity, And iT sEems to me thAt napoleon ShOwEd GreaTnEss of sOul by not FeArInG to take oN himSeLf thE wHoLe respOnsibiLity Of thAt deed." "DiEu! mon DIeu!" mUtTeRed ANnA PavLoVna in a teRrified whiSpEr. "What, MoNsieUr piErRe... Do you consider that assassination shows greatness of soul?

Can you guess what this text represents ? :-) It's extract from the book "War and Peace" but not only that. This peace of text holds secret message also. When this text is decoded it gives such secret message:

the quick brown fox jumps over the lazy dog.

So this time i would like to add new meaning to phrase "reading between the lines" in computing world. Actually it's better to say "reading between the chars" in this case :-) Talk will be about text steganography - how to hide some secret message, aka. payload in original text fragment. Idea is to modulate letter casing so that casing of letters itself in original text would carry additional information which can be used for example to encode some secret message in text. So at first you need to choose some prefixed alphabet of characters to be used in secret message. It's better to have small alphabet, because bigger set will result in a need for a longer original text so that it could contain all secret message. I've used these chars for secret message - 'qwertyuiopasdfghjklzxcvbnm. |'. It's 30 characters. So you need at least 30 different secret states to represent these characters in text. I've chosen letter case to represent this secret state. Upper case of letter means bit 1, lower case - 0. So by using such scheme you can encode 1 bit into letter casing of 1 character (or 2 total states - uppercase,lowercase). Next step is to calculate how much letters of original text you need to encode 1 secret letter by using such scheme. We need to multiply number of states of each letter, so if one letter holds 2 states, 2 letters holds 2*2=4 states and so on. So 5 letters holds 2*2*2*2*2 = 32 states or 5-bit integer of additional information which is enough for our 30 letters alphabet. This integer will represent our secret letter index in our alphabet. So for 1 secret char you need 5 original text chars. General encoding algorithm idea is this (decoding procedure is very similar just some steps are in reverse order):
1. Get next secret char from the secret message
2. Get secret char index in your alphabet
3. Decompose this index into 5-bit binary pattern
4. Encode this 5-bit pattern into 5 letters from original text using uppercase mode as bit 1, lowercase mode - as bit 0
5. Repeat everything from point 1 until all chars of secret message are encoded
I've prepared for this algorithm Java program, you can download it and use as you wish,- as always no restrictions applied - use at your own risk :-) . There are just two classes - Test.java is main program and TextSteganography.java is class which encapsulates encode/decode methods for secret messages. Class TextSteganography.java holds two most important methods - "modulateText" and "extractSecretMessage" which code is this:
    public String modulateText(String text, String hiddenText) {
hiddenText += '|';
char[] inp = text.toCharArray();
char[] hidden = hiddenText.toCharArray();
int letterIx = -1;

CheckCondition(containsOnlyValidChars(hiddenText), String.format("Hidden text can only contain chars from the set '%s'",
validChards.substring(0, validChards.length()-1)));
int letterCount = countLetters(text);
CheckCondition(letterCount >= 10 * hiddenText.length(),
String.format("Must be at least %d letters in text, but given %d for current payload message", 10 * hiddenText.length(), letterCount));

for (int i = 0; i < hidden.length; i++) {
int ix = validChards.indexOf(hidden[i]);
String patternString = "0000" + Integer.toBinaryString(ix);
char[] pattern = patternString.substring(patternString.length() - 5).toCharArray();
boolean[] convertToUppercase = upperCaseIsNeeded(pattern);

for (int j = 0; j < convertToUppercase.length; j++) {
letterIx = getNextLetterIndex(inp, letterIx);
inp[letterIx] = (convertToUppercase[j])? changeLetterCase(inp[letterIx], LetterCase.UPPER_CASE) :
changeLetterCase(inp[letterIx], LetterCase.LOWER_CASE);
letterIx = getNextLetterIndex(inp, letterIx);
}
}

return String.valueOf(inp);
}

public String extractSecretMessage(String text) {
char[] inp = text.toCharArray();
int letterIx = -1;
int letterCount = 0;
int binaryPattern = 0;
int newBit;

while (true) {
letterIx = getNextLetterIndex(inp, letterIx);
if (letterIx != -1) {
letterCount++;
newBit = (getLetterCase(inp[letterIx]) == LetterCase.UPPER_CASE)? 1 : 0;
binaryPattern = (binaryPattern << 1) + newBit;

if (letterCount == 5) {
char c = validChards.charAt(binaryPattern);
if (c != '|')
letterCount = 0;
binaryPattern = 0;
if (c == '|')
break;
}
letterIx = getNextLetterIndex(inp, letterIx);
}
CheckCondition(letterIx != -1, "Text was not encoded with this steganography encoder !");
}

return charListToString(charList);
}

Some additional interesting comments. At first I wanted to use assertions in Java. But I've found information that assertions are not recommended in general, at least in production code. Besides they are turned-off by default in Java virtual machine. You need to turn them on by -ea switch. But I like assertions very much, because they gives you an opportunity to check some conditions in fast way without writing 'if(s)'. So I wrote function "CheckCondition" which is assert analog but acts in recommended way - throws an exception if condition is not met. Next interesting note - when I profiled an application I found that secret message extraction is relatively slow because each decoded character was appended to output string by StringBuilder. And that's natural, because as I've understood String is just a wrapper around char array. And as we know appending new array element is slow procedure, because you need to re-size array, copy elements to new array and etc, and this takes time. So instead I append new decoded characters into LinkedList of chars and later I converted this list into usual String. This approach works faster when you need to build new String incrementally. That's all.
Have fun in using text steganography !