codingBat separateThousands using regex (and unit testing how-to)

Posted by polygenelubricants on Stack Overflow See other posts from Stack Overflow or by polygenelubricants
Published on 2010-04-24T08:04:18Z Indexed on 2010/04/24 8:13 UTC
Read the original article Hit count: 452

Filed under:
|
|
|

This question is a combination of regex practice and unit testing practice.

Regex part

I authored this problem separateThousands for personal practice:

Given a number as a string, introduce commas to separate thousands. The number may contain an optional minus sign, and an optional decimal part. There will not be any superfluous leading zeroes.

Here's my solution:

String separateThousands(String s) {
  return s.replaceAll(
      String.format("(?:%s)|(?:%s)",
        "(?<=\\G\\d{3})(?=\\d)",
        "(?<=^-?\\d{1,3})(?=(?:\\d{3})+(?!\\d))"
      ),
      ","
  );
}

The way it works is that it classifies two types of commas, the first, and the rest. In the above regex, the rest subpattern actually appears before the first. A match will always be zero-length, which will be replaceAll with ",".

The rest basically looks behind to see if there was a match followed by 3 digits, and looks ahead to see if there's a digit. It's some sort of a chain reaction mechanism triggered by the previous match.

The first basically looks behind for ^ anchor, followed by an optional minus sign, and between 1 to 3 digits. The rest of the string from that point must match triplets of digits, followed by a nondigit (which could either be $ or \.).

My question for this part is:

  • Can this regex be simplified?
  • Can it be optimized further?
    • Ordering rest before first is deliberate, since first is only needed once
    • No capturing group

Unit testing part

As I've mentioned, I'm the author of this problem, so I'm also the one responsible for coming up with testcases for them. Here they are:

INPUT, OUTPUT
"1000", "1,000"
"-12345", "-12,345"
"-1234567890.1234567890", "-1,234,567,890.1234567890"
"123.456", "123.456"
".666666", ".666666"
"0", "0"
"123456789", "123,456,789"
"1234.5678", "1,234.5678"
"-55555.55555", "-55,555.55555"
"0.123456789", "0.123456789"
"123456.789", "123,456.789"

I haven't had much experience with industrial-strength unit testing, so I'm wondering if others can comment whether this is a good coverage, whether I've missed anything important, etc (I can always add more tests if there's a scenario I've missed).

© Stack Overflow or respective owner

Related posts about java

Related posts about regex