Case for CJAN - Comprehensive Java Archive Network
Created: 2005-Oct-23
Last update: 2006-Apr-21 - Well, I wrote CJAR and my company SourceLabs made it public at http://area51.sourcelabs.com/cjar please see the vision for the service here
OK, there is the thing I advocate and cry about constantly (as Konstant-in I am supposed to do things constant-ly, right? :) )
Why the hell Java does not have repository like CPAN and tools to work with it?
I think that lack of the unambiguous place to get Java libraries enormously hinders Java community and Java industry.
I see enormous benefits for Java community and Java industry at large to have such unambiguous repository and accompany it with a little bit of infrastructure. Too sad that Sun does not do that and I think that it should become grass-root efforts.
There I will try to outline my idea of Java repository and ways it could simplify life for Java developers.
Repository itself:
Technology wise I think we have it is all available right now: it is Maven(1) repository structure ( make no mistake, I am not advocating Maven by any means ). I think it is good enough to be a robust foundation for the repository.
Tools:
There are tools available right now: Maven itself quite obviously.
Then Ant ‘dependencies’ task that can utilize Maven repositories today in Ant based builds.
Missing part: notion of "profile" - where under term 'profile I understand a definition that lists mutually compatible versions of java libraries. Of course there might be infinite number of combinations given the number of available libraries, but I think repository should provide few basic profiles:
- "conservative" - enlists libraries for java 1.2;
- "contemporary" - enlists libraries for java 5;
- "bleeding_edge" - "latest and greatest" versions;
Community will provide feedback and companies like SourceLabs could submit certified 'profiles'.
Benefits at DEVELOPMENT time:
Using 'profile' removes question of where to get libraries and which versions to use: in our build script we use profile provided versions like this:
<property name="profile" value="conservative" />
<property url="http://www.javarepo.org/profiles/${profile}.properties"/>
<property name="repositoryList" value="http://internal.mycompany.com/maven/,http://www.ibiblio.org/maven/,http://repository.codehaus.org/"/>
<taskdef classname="org.apache.tools.ant.taskdefs.optional.dependencies.Dependencies" name="dependencies"/>
<dependencies pathId="commons.classpath" repositoryList="${repositoryList}">
<dependency group="commons-beanutils" version="${commons-beanutils}"/>
</dependencies>
Despite it simplicity it is very convenient and straightforward way of getting what we need for a project. During build time we can easily change profiles by supplying -Dprofile=experimental to the build process.
Is it perfect approach? - of course not: Maven and Ivy for example are trying to provide more things like version conflict resolution and transitive dependencies at the expense of simplicity and transparency.
One thing about transitive dependencies: it looks cool in the theory but hard to get done properly in practice.
Lets consider Spring dependencies for example: it kind of depends on everything, but for a particular project we do not want to bring everything into our WEB-INF/lib for example, therefore in the end developer have to specify exactly what the project needs.
Benefits at RUNTIME
There is wonderful piece of technology comes with Java: Java-Web-Start - for those unfamiliar with this marvel: Using Java Web Start technology, standalone Java software applications can be deployed with a single click over the network. Java Web Start ensures the most current version of the application will be deployed, as well as the correct version of the Java Runtime Environment (JRE).
Lack of ubiquitous repository for JWS to use creates the following problems:
#1 increased startup time first time user tries to use JWS application;
#2 unnecessary duplication of files on hard drive;
Well, I am not particularly worrying about #2 but #1 is very serious issue IMO.
If we look at a typical not too big application then we could see that code of the application is a small portion of all classes needed to run the application.
Lets consider a very small chat application that wants to be skinnable via SwiXML, and wants to use Hessian protocol to communicate with server:
I would suspect that the code of such application will be approximately 50Kb,
Then it needs
SwiXML library - 50K,
That needs JDOM 150K;
j2h.jar - 338K;
ui.jar - 392K;
hessian.jar 144K;
Hmm: custom code 50K, libraries - 1074K - which is 1/21 ratio, or just 4.65% of the codebase is the custom code.
For a modem user it means the difference between 1 second startup time vs. 21 seconds.
And if we add some of jakarta-commons to the mix the time wasted is just outrages.
Of course this does not matter much for a regular user of a JWS application because all those libraries will be downloaded to the customer computer and subsequent restarts will be instant. But when the user decides to try another chat application then he needs to download all the libraries on the computer all over again. Why does it need to be this way? I do not know much reasons for that.
But I am sure that having repository in place will:
- greatly enhance user experience;
- improve security (how am I supposed to know that the log4j I download from a site is a true copy of the original log4j? CJAN might provide some assurance in autenticity of libraries in the repositoty via strictly controlled submission process);
- and as a result it will improve Java penetration on the desktop;
Benefits for the Java community and industry at large
At this time Java experiences kind of crisis because it got big and overwhelming for many developers, especially for developers coming from centralized command environment or new generation of developers. Growing frustration creates opportunity for smaller and more agile environments: Ruby, Python, Perl and Parrot VM are examples of technologies which promise easiness of development. Might be I just an 'old grumpy man' whining, but I do not see much value in revolutions and believe more in evolution. I think it is silly to abandon all the tools and libraries, which were developed during all those years and jump on new technology.
IMO Java is guilty in negligence too. I say that Java should make it easier to cooperate with native solutions: it is kind of sad to see that people reinvent database and mail servers in Java instead of using and enhancing existing solutions. It is a bit utopian, but I would like a problems to be solved once or twice and then we should move on solving other more important things.
TODO
Actually we can start using CJAN repository inside our companies right now ( I actually do just that for couple of years already) - all it takes:
- Have a web server that serves files from Maven repository structure;
- Someone in charge of the content, as new versions of company used libraries become available the person appropriately name them and copy to the repository. For example many open source projects (unfortunately) do not place version name in the jar name and it makes hard to track version mismatch problems or some subtle behavioral differences between production and stage systems;
- Place profile-name.properties files on the same server and update them when necessary;
- Have Ant builds use the dependency task;
- Have the local Maven repository as the first one in the list of repositories for Maven and Ant;
And of course lets lobby Sun to implement CJAN!!!!!
In the meantime I wrote CJAR and my company SourceLabs made it public at http://area51.sourcelabs.com/cjar you are welcome to use it.
I should check if I can submit JCR as JCP member and become head of the committee :)