Spotting duplicate classes in Jar files

October 20th, 2008 by  |  Published in Maven, Shell  |  6 Comments

After more than a month it’s time for another post. Sorry to all of you for keeping you waiting … well, honestly I don’t think somebody even noticed 🙂

Today, I stumbled upon a classpath related problem – once again. As I doubt that I am the only one to ever face this problem, I want to share a short shell script that came to the rescue.

But first, what was the problem? After adding some additional dependencies to our POM – quite carelessly I have to admit – our application started sending mails without subject, sender address and messed up special characters. Interestingly enough, I didn’t touch the mail part at all. I added Apache CXF dependencies, i.e. web service stuff. So what was wrong?

CXF comes with quite a lot of transitive dependencies (that should be declared optional, I’d say), including some Geronimo Spec jars. One of them was org.apache.geronimo.specs:geronimo-javamail_1.4_spec that superseded classes from javax.mail:mail. A simple exclusion fixed the problem.

However, I wanted to be prepared for the next time as this tends to happen quite regularly with a growing number of transitive dependencies. There are packages with same content and different id, (e.g. org.mortbay.jetty:servlet-api and javax.servlet:servlet-api), artifacts that contain subsets of other artifacts (e.g. org.springframework:spring contains everything from org.springframework:spring-web), and artifacts that should be replaced in favor of others (e.g. slf4j instead of commons-logging.

That’s why I wrote this little bash script:

#!/bin/bash
for lib in `find . -name '*.jar'`; do
for class in `unzip -l $lib | egrep -o '[^ ]*.class$'`; do
class=`echo $class | sed s/\\.class// | sed s/[-.\/$]/_/g`
existing=$( eval "echo $CLS_${class}" )
if [ -n "$existing" ]; then echo "$lib $existing"; fi
eval CLS_${class}="("${lib} ${existing}")"
done
done | sort | uniq -c | sort -nr

These few lines of code print packages and the number of their common classes. Sample output is:

stf@crabman:/path/to/my/webapp$ duplicates.sh
127 ./WEB-INF/lib/aspectjweaver-1.5.3.jar ./WEB-INF/lib/aspectjrt-1.5.3.jar
117 ./WEB-INF/lib/spring-2.0.8.jar ./WEB-INF/lib/spring-web-2.0.8.jar
83 ./WEB-INF/lib/commons-beanutils-1.7.0.jar ./WEB-INF/lib/commons-beanutils-core-1.7.0.jar
38 ./WEB-INF/lib/servlet-api-2.3.jar ./WEB-INF/lib/servlet-api-2.5-6.1.11.jar
10 ./WEB-INF/lib/commons-beanutils-core-1.7.0.jar ./WEB-INF/lib/commons-collections-3.2.jar
10 ./WEB-INF/lib/commons-beanutils-1.7.0.jar ./WEB-INF/lib/commons-beanutils-core-1.7.0.jar ./WEB-INF/lib/commons-collections-3.2.jar
9 ./WEB-INF/lib/spring-2.0.8.jar ./WEB-INF/lib/aopalliance-1.0.jar
6 ./WEB-INF/lib/commons-logging-1.1.jar ./WEB-INF/lib/jcl104-over-slf4j-1.5.0.jar

The code isn’t very fast, but makes solving a common classpath problem a lot less painful. (Note that I am using plain unzip instead of jar as didn’t have a JDK installed on this server/)

Responses

  1. Ashish Paliwal says:

    October 21st, 2008 at 4:27 am (#)

    Great work 🙂 Duplicate classes is indeed a big problem when using multiple open source projects and I have gone nuts multiple times, coz of these issues. You should have written this post earlier 🙂

  2. Roman says:

    November 25th, 2008 at 12:22 pm (#)

    Hi, taking a quick look at your script I see you unzip the jars. You can also check the contents of a jar file with `jar tf somelib.jar`

  3. Stefan Fußenegger says:

    December 11th, 2008 at 2:38 pm (#)

    Hi Roman,thanks for your reply. You should have had a closer look though 😉 The -l flag causes unzip to list the contained files instead of unzipping them. I also explained why I didn’t use the jar utility in the very last sentence: “Note that I am using plain unzip instead of jar as didn’t have a JDK installed on this server.”Cheers, Stefan

  4. Roman says:

    January 27th, 2009 at 2:17 pm (#)

    D’oh, was too quick, sorry! 🙂

  5. Marcin says:

    April 30th, 2009 at 9:04 am (#)

    Hi Stefan,

    How easy or feasible would be to list those duplicates in the form of a report? Any tips or code snippet on this ?

    Cheers
    Marcin

  6. Stefan Fußenegger says:

    April 30th, 2009 at 3:08 pm (#)

    That snippet is all I have right know. However, changing a single line, you can output not only the jars but also the class file:

    if [ -n “$existing” ]; then echo “$lib $existing $class”; fi

    and remove the last ‘uniq -c | sort -nr’ as it will most likely only output 1 now. However, it should finally be quite easy to parse the output and create a report containing the jar files and the classes they have in common – would be nice to get a copy 😉

Leave a Response