Simon Nickerson: Coding Blog

Software development and mathematics

Books clear out – please claim!

Books

Click on picture to get a close-up.

Java puzzle 2: Extra long Unicode literals

Another simple Java puzzle. Does the following code compile? If so, what does it do when you run it?


public class Unicode {
	public static void main(String[] args) {
		System.out.println("\uuuuuuuuuuuuuuuu0055");
	}
}

Answer:
Show ▼

Java puzzle 1: fun with Loggers

I’ve been a fan of Java Puzzlers by Joshua Bloch and Neal Gafter since a colleague lent me a copy about a year ago. Every now and again, I come across weird things in Java that I think would make a good puzzle.

In the spirit of Java Puzzlers, consider the following piece of code. It takes a single command line argument, which is an integer. What is the expected output for inputs of 100000 and 1000000 (i.e. 10^5 and 10^6)?


import java.util.logging.Handler;
import java.util.logging.Level;
import java.util.logging.LogRecord;
import java.util.logging.Logger;

public class Test {
	public static void main(String[] args) {
		addHandler();
		int numObjects = Integer.parseInt(args[0]);
		for (int i = 0; i < numObjects; i++) {
			new Object();
		}
		Logger logger = Logger.getLogger("test");
		logger.log(Level.INFO, "Hello world");
	}

	private static void addHandler() {

		Logger logger = Logger.getLogger("test");
		logger.addHandler(new Handler() {
			@Override
			public void publish(LogRecord record) {
				throw new RuntimeException("Goodbye cruel world");
			}
			@Override
			public void flush() {
			}
			@Override
			public void close() throws SecurityException {
			}
		});
	}
}

Answer:
Show ▼

Why Windows 7 wasn’t accepting any passwords on our new laptop

We bought a new Samsung NP-X120 laptop the other day, which I’ve christened Sirius (previous laptops were called Hagrid and Lupin).

Samsung NP-X120 laptop

Sirius

I spent a couple of hours on Easter Monday installing software (Eclipse, emacs, iTunes, the usual stuff) and running Windows Easy Transfer to copy stuff from XP to Windows 7.

“Hi, could you tell me what you set my password to on the new laptop”.

“I set it to ‘monkey’”. (Note that I didn’t actually set it to monkey. I’m not going to tell you her password. Sheesh!)

“It doesn’t work. It’s not letting me in.”

“Did you type it all in lower case?”

“Yes”

“Is caps lock on?”

“No. Can you tell me your password?”

I’ve been conditioned not to tell people my password, but this is my wife after all. Even so, I felt a bit funny telling someone a password over the phone.

“Well, I’d rather not, I’m in a train full of people. I’ll text it to you.”

I hung up, sent her my password (something memorable like ‘kH6mnP98s1&*’) as a text message. This was a 15 minute exercise because texting anything other than complete sentences with real English words is a bit of a headache on my phone.

A bit later, I got another phone call.

“It’s still not working.”

Hmmm… curious. When I got home, sure enough, neither account’s password was being accepted by Windows. While I could believe that I would be unfortunate enough to mis-type one password, mistyping two is looked like carelessness. Also, how was I going to log in to reset the password? The administrator password didn’t work either.

To cut a long, frustrating story short, after running a recovery (and losing most of my installations from Monday) it turns out that the Num Lock key was turned on. There appears to be no visual indication of this in Windows (as there is for example with the Caps Lock key) and there’s no LED for it on the keyboard. So a password of “monkey” was being entered as “06n2ey”, but because the text is hidden behind asterisks, I couldn’t see this. Because this was Windows 7 Home Premium, you get given a list of user names, rather than being given the opportunity to type the user’s name in (which also might have alerted me). One clue that I should have picked up on earlier was that the shift key appeared not to work; but since this was a new laptop, I thought it might just be a faulty keyboard. Ah well…

According to some forum posts, Windows 7 remembers the state of the Num Lock key from the last time you ran. So presumably I had turned it on by accident, and I’ll know to check for this next time.

“Operation currently prohibited by disk” considered harmful

Our DVD player will frequently show a little banner on the TV screen when you press a button. The banner reads:

Operation currently prohibited by disk

Prohibited. By a disk.

What business is it for a DVD to say that I can’t fast forward through their idiotic and patronising piracy warnings and disclaimers?

The worst offender I’ve seen for a while is Disney (with the DVD for the otherwise excellent film Up) who are very keen for me to watch trailers for about a dozen other films before they are prepared to show me the main menu.

This is a classic example of Lawrence Lessig’s idea of code as law, but it seems to me that preventing navigation just because you want to preach about piracy is treating your paying customers with total contempt. Just stop it please.

Podcast: “Shot of JAQ”

In a prior life on Usenet, I knew a chap called Stuart Langridge, who went by the nick “Aquarius”. Aq had interesting and forthright opinions on pretty much any subject you cared to mention, and could put them forward eloquently and persuasively.

To my delight, I’ve discovered that Aq and a compadre (Jono Bacon) now do a regular podcast called Shot of JAQ. The idea is that the two of them have a conversation about issues about software and the open source community, the idea being that the conversation continues with the community on their website. Recent podcasts have been on Google’s decision to move out of China, e-book readers, alternatives to Skype, and Simon Cowell and the UK Christmas Number One.

Each podcast is about 10 minutes long, so is perfect for dipping into. Highly recommended.

The Internet is for cat pictures

Because the Internet is primarily for distributing pictures of cats (lol- or otherwise), here are Cleo (who sadly died last year) and Henry.

Is a random set of bytes a UTF-8 string?

UTF-8 is an encoding for Unicode text with several nice properties:

  • Any Unicode string can be encoded
  • For pure ASCII text (characters under 128), the UTF-8 encoded form is the same as the ASCII form
  • It’s pretty well universally recognised as a text-interchange format on the Internet
  • Random byte streams are unlikely to be misinterpreted as UTF-8

It’s this last property I want to talk about here.

Let p(n) be the probability that a given byte stream can be validly interpreted as UTF-8. For instance, p(1) = 1/2, because the only valid UTF-8 byte streams with only one byte are the ones with the most significant bit not set. I want to compute p(16), the probability that a random 16 byte stream can be interpreted as UTF-8.

Method 1: Monte-Carlo

Java 5 introduces new classes (Charset and CharsetDecoder) that allow us to see whether a given byte stream is valid for a given character set. The technique I’m using is:

  1. Compute a number of byte streams at random
  2. Use CharsetDecoder to see whether they validate
  3. Estimate the probability by dividing the number of valid strings by the size of the sample

When I ran below, I got an estimate of 1.025e-04 for p(16). The code is here:


import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CoderResult;
import java.nio.charset.CodingErrorAction;
import java.util.Random;

public class UTF8MonteCarlo
{
 // Size of sample
 private static final int MAX = 10000000;

 public static void main(String[] args)
 {
  double probEstimate = monteCarloCharSetProbability(16, MAX, "UTF-8");
  System.out.printf("%e%n", probEstimate);
 }

 private static double monteCarloCharSetProbability(int byteStreamSize, int sampleSize, String charSetName)
 {
  int validStrings = 0;
  Charset charset = Charset.forName(charSetName);
  CharsetDecoder decoder = charset.newDecoder();
  decoder.onMalformedInput(CodingErrorAction.REPORT);
  byte[] b = new byte[byteStreamSize];
  CharBuffer charBuffer = CharBuffer.allocate(byteStreamSize);
  Random random = new Random();
  for (int k = 0; k < MAX; k++) {
   random.nextBytes(b);
   CoderResult result = decoder.decode(ByteBuffer.wrap(b), charBuffer, true);
   charBuffer.clear();
   if (!result.isMalformed() && !result.isUnmappable()) {
    validStrings++;
   }
  }
  return (double)validStrings / sampleSize;

 }
}

Method 2: Recurrence relation

Using Wikipedia’s description of the encoding, we can derive a recurrence relation for p(n):

p(n) = p(n-1) / 2 + p(n-2) / 32 + p(n-3)/256 + p(n-4) / 2048

where we can define p(0) = 1, p(-1) = 0, p(-2) = 0, p(-3) = 0.

Using this formula, we can compute a value of 1.063853e-04 for p(16). The code is here:


public class UTF8Recurrence
{
 private final static int MAX_SIZE = 16;
 public static void main(String[] args)
 {
  double p[] = new double[MAX_SIZE + 4];
  p[0] = 0d;
  p[1] = 0d;
  p[2] = 0d;
  p[3] = 1d;
  for (int i = 1; i <= MAX_SIZE; i++) {
   p[i + 3] = p[i + 2] / 2 + p[i + 1] / 32 + p[i] / 256 + p[i - 1] / 2048;
   System.out.printf("%e%n", p[i + 3]);
  }
 }
}

The probabilities for larger values are shown here:

n p
16 1.063853e-04
32 1.304427e-08
64 1.961083e-16
128 4.432493e-32