Let me introduce one of our numerous unknown Open Source Java libraries on GitHub: molindo-mysql-collations-lib. It really is small with a single public Java class: a Comparator for Strings with the behavior of (most) MySQL collations.
The JDK already provids a decent collation library with java.text.Collator and an even better one is available with icu4j, so why bother? Well, sometimes you simply want your application and your database (a MySQL database that is) to have consistent sort order and equality. If you write heavily database-centric applications like our very own molindo-dbcopy, it’s mandatory.
As there is no Java collation 100% consistent with MySQL, we’ve decided to go for JNI. No, really. My C programming is pretty rusted, but somehow I’ve got it done. Basically, we use libmysqlclient.so and the method strnncollsp(..) defined in m_ctype.h (documentation can be found in string/CHARSET_INFO.txt of MySQL’s source distribution). Basically, it’s the trailing space ignoring equivalent of strnncoll(..) which “compares two strings according to the given collation”. As simple as that. Why no trailing spaces? Well, “all MySQL collations are of type PADSPACE. This means that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces” (see docs).
The library performs fairly well but requires some more testing. It’s available from Maven Central and comes with a pre-compiled library for Ubuntu amd64 that requires libmysqlclient. You can however tweak build.sh to your needs if you are on another operating system.