Using MySQL Collations in Java

October 7th, 2009 by  |  Published in Hibernate, Java  |  2 Comments

I’ve recently discovered Stackoverflow as a nice pass-time on the one hand and as a valuable source for answers on the other hand. Normally it takes only a few minutes to get answers for most questions. However, I managed to ask a question that nobody was able to answer yet. The question was about Collations. As I’m suspecting that Collations are a Java feature that is hardly used, I kept working on the problem myself rather then just waiting for an answer on Stackoverflow.

I’ve managed to get something working right now. It’s not completely tested but it should work quite well. What I’m doing is the following: I parse the charset files of MySQL (on an Ubuntu system, you can find them in /usr/share/mysql/charsets/) and do the collation based on those files myself rather than using Java’s built-in collations.

EDIT: I’ve just created a github project that’s available as a Maven artifact from Sonatype’s OSS repository.

Responses

  1. Shlomo says:

    November 29th, 2010 at 1:33 pm (#)

    Hi,

    First of all – you’re the only one on the internet that tried to solve this problem :)

    I’ve got a few comments:
    1. The link to the code doesn’t work :(
    2. Is this the way you eventually implemented the solution, as a custom made Collator? Did you find a better way to do it perhaps?
    3. Would you need to parse the MySQL charset files if you’re dealing only with utf8_unicode_ci? For that one, btw, there is no charset file, and I’m pretty sure it’s compiled into the server.

  2. Stefan Fußenegger says:

    November 29th, 2010 at 4:15 pm (#)

    hi shlomo,

    1. I’ve just created a fresh github project containing the sources. simply see the links at the very end of the article.

    2. the version on github is the actual code we’ve been using even since in production

    3. this implementation requires XML files. compiled charsets are a completely different story.

    cheers, stefan

Leave a Response