Introducing: molindo-mysql-collations-lib

Let me introduce one of our numerous unknown Open Source Java libraries on GitHubmolindo-mysql-collations-lib. It really is small with a single public Java class: a Comparator for Strings with the behavior of (most) MySQL collations.

The JDK already provids a decent collation library with java.text.Collator and an even better one is available with icu4j, so why bother? Well, sometimes you simply want your application and your database (a MySQL database that is) to have consistent sort order and equality. If you write heavily database-centric applications like our very own molindo-dbcopy, it’s mandatory.

As there is no Java collation 100% consistent with MySQL, we’ve decided to go for JNI. No, really. My C programming is pretty rusted, but somehow I’ve got it done. Basically, we use libmysqlclient.so and the method strnncollsp(..) defined in m_ctype.h (documentation can be found in string/CHARSET_INFO.txt of MySQL’s source distribution). Basically, it’s the trailing space ignoring equivalent of strnncoll(..) which “compares two strings according to the given collation”. As simple as that. Why no trailing spaces? Well, “all MySQL collations are of type PADSPACE. This means that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces” (see docs).

The library performs fairly well but requires some more testing. It’s available from Maven Central and comes with a pre-compiled library for Ubuntu amd64 that requires libmysqlclient. You can however tweak build.sh to your needs if you are on another operating system.

<dependency>
<groupId>at.molindo</groupId>
<artifactId>molindo-mysql-collations-lib</artifactId>
<version>0.1.0</version>
</dependency>


Upgrade from Maven 2 to Maven 3 and property substitutions

I just ran across a problem after upgrading from Maven 2 to Maven 3. It seemed as if one of our properties, which we defined like this in the pom.xml of the parent project

1
<foobar.prop>barfoo.value</foobar.prop>

wouldn’t be replaced in the child pom. Our directory lineout was:

Screen Shot 2013-07-16 at 11.44.01

The resulting error message was:

1
'dependencyManagement.dependencies.dependency.groupId' for ${foobar.prop}:foobar:jar with value '${foobar.prop}' does not match a valid id pattern.

The problem lies in the way Maven 3 handles the path to the parent pom. The file child-of-child/pom.xml referenced its parent like this:

<parent>
	<groupId>at.molindo.pom</groupId>
	<artifactId>child-of-parent</artifactId>
	<version>1.0</version>	
</parent>

There’s the new behaviour that Maven 3 brought into place. Maven 2 tried to look for the pom.xml of child-of-parent in the reactor of currently building processes first. Whereas Maven 3 doesn’t look there at all, but relies on an explicit path first, before looking into the local repository.

Therefore the solution was to add the path:

<parent>
	<groupId>at.molindo.pom</groupId>
	<artifactId>child-of-parent</artifactId>
	<version>1.0</version>
	<relativePath>../child-of-parent/pom.xml</relativePath>
</parent>

There you go.


Google Webmaster Tools report a significant increase of 404 errors

This is the first post after a while, and we’re trying to have more than one post every two years. But let’s be honest, you’re not interested in this yadda yadda anyway and just came here because Google Webmaster Tools sent you a kind message like

Increase in not found errors

Then you went to GWT and saw an unexpected spike in 404 (or 410 where Google doesn’t make a difference) like this one:

Screen Shot 2013-07-04 at 11.23.59

The total amount of course varies from case to case as Google sends the warning if an unusual increase happens. However the numbers can be as high as hundreds of thousands not found pages.

And then you immediately thought “What’s wrong? Will this harm my rankings?”

You surely stumbled across the official post that confirmed that it won’t harm your rankings in most cases.

So you’re kind of safe now, but you want to know what caused the spike and how to get rid of the errors? Well I can’t tell you what your problem was, but I’ll show you mine and maybe, just maybe, it’s the same ;-)

The reason for the spike was that Google crawled our Javascript and discovered something that looked like an URL but in fact was a cometd channel id. This can of course also happen to generated links in javascript with snippets (or the infamous /a generated by jquery). Most times you don’t want Google to crawl those links – but how can you avoid Google to crawl Javascript?

To put it simple: you shouldn’t.

Here’s a short explanation why: By all means returning a 404 in this case it the right thing to do, but a rising amount of 404 in your webmaster tools console and a weekly warning message just don’t feel right. The immediate thought usually is to block the URLs via robots.txt. But that’s the wrong thing to do for two reasons:

  1. You tell Google that there is a page existing (which isn’t) and it’s just not allowed to crawl
  2. You move the 404 errors to the “blocked URLs” section of your webmaster tools

So if you want to get rid of you 404 error in Google Webmaster Tools that were caused by Google crawling your javascript, perform a 301 redirect to the best matching page, or if there isn’t one to the homepage. This way you’ll get the errors out of GWT without moving the errors to the blocked section.

But why believe me? I could just fool around with you, couldn’t I? Well no, I spent some time researching on this topic and came across an official answer by a Google employee: see the initial thread on Google Groups and the follow up question on stackoverflow

Now have fun redirecting!

tl;dr Google reports 404 errors in GWT due to crawling your Javascript? 301 the URLs to the next category page (or homepage)

 


Serving Wicket Resources from CDN

This post is short but still just amazing. Now it’s possible to serve Wicket resources from CDNs supporting custom orgins within just a few minutes (at least if you are already using wicketstuff-merged-resources). First of all, you’ll need the latest version (3.1-alpha-1) of wicketstuff-merged-resources (which is now available from Maven Central!).
Read the rest of this entry »


wicketstuff-merged-resources: 2.1 and 3.0 released!

I’m happy to announce the releases of wicketstuff-merged-resources 2.1 and 3.0.

Read the rest of this entry »


Compass: Role based searching using CompassQueryFilter

While implementing a forum with wicket, spring, hibernate and compass for search, I recently ran into a problem: there are topics and posts that should only visible for some users. Say there’s a moderator forum where all content should only be visible for … well moderators :-).
Read the rest of this entry »


The Final Take On Java System Properties

In this post, I’m looking for active collaboration of my readers (as I really hope that I have some). I’ve thought about a simpler way to handle Java system properties as I tend to forget them all the time. Additionally, I don’t like to see them as string constants – neither within the code nor somewhere else. I’ve come up with a single enum class, that aims to simplify handling of system properties. Actually, you won’t ever think of possible best practices – hence “The Final Take on Java System Properties” :)
Read the rest of this entry »


Efficiently Tracking Response Time Percentiles

As we’ve recently started feeling that response times of one of our webapps got worse, we decided to spend some time tweaking the apps’ performance. As a first step, we wanted to get a thorough understanding of current response times. For performance evaluations, using minimum, maximum or average response times is a bad idea: “The ‘average’ is the evil of performance optimization and often as helpful as ‘average patient temperature in the hospital'” (MySQL Performance Blog). Instead, performance tuners should be looking at the percentile: “A percentile is the value of a variable below which a certain percent of observations fall” (Wikipedia). In other words: the 95th percentile is the time in which 95% of requests finished. Therefore, a performance goals related to the percentile could be similar to “The 95th percentile should be lower than 800 ms”. Setting such performance goals is one thing, but efficiently tracking them for a live system is another one. Read the rest of this entry »


Wicket: Annotation-based Mounting of Resources

Today, I’m happy to announce the availability of annotation-based mounting and merging of resources in wicketstuff-merged-resources (version 3.0-SNAPSHOT for Wicket 1.4, version 2.1-SNAPSHOT for Wicket 1.3). In order to mount resources, all that’s needed is adding annotations to component classes:

@JsContribution
@CssContribution(media = "print")
@ResourceContribution(value = "accept.png", path = "/img/accept.png")
public class PanelOne extends Panel {
 
    public PanelOne(String id) {
        super(id);
        // ...
    }
}

Read the rest of this entry »


Using MySQL Collations in Java

I’ve recently discovered Stackoverflow as a nice pass-time on the one hand and as a valuable source for answers on the other hand. Normally it takes only a few minutes to get answers for most questions. However, I managed to ask a question that nobody was able to answer yet. The question was about Collations. As I’m suspecting that Collations are a Java feature that is hardly used, I kept working on the problem myself rather then just waiting for an answer on Stackoverflow.

I’ve managed to get something working right now. It’s not completely tested but it should work quite well. What I’m doing is the following: I parse the charset files of MySQL (on an Ubuntu system, you can find them in /usr/share/mysql/charsets/) and do the collation based on those files myself rather than using Java’s built-in collations.

EDIT: I’ve just created a github project that’s available as a Maven artifact from Sonatype’s OSS repository.