I often find myself annoyed by something while coding on work projects, and then quickly venting my grievance on the company-internal Skype chat. No reason the whole world should not be able to participate. This will be about Java, and in Japanese.
Fri, 04 Dec 2009
The thousand and one things I want to see fixed in Java: [2] Collection and other literals
Tieing in with the second of my almost thousand and one reasons to love Perl, the Java language really needs a few more straightforward ways to create objects from values given in program source code.
No problem with (single-line) strings and arrays:
String x = "abc";
int[] a = new int[] { 1, 2, 3, 4, 5 };
Lists can be created using array literals (or varargs):
List<String> ls = Arrays.asList("a", "b", "c");
No such luck with Maps. Some people suggest to use an instance initializer block, but this actually creates an anonymous inner class, which seems costly and can have unexpected side-effects.
new HashMap() {{
put("a", "first");
put("b", "second");
put("c", "third");
}};
But worst of all are regular expressions, because they need to be constructed from Strings, and they frequently use backslashes, which unfortunately act as character escapes in String literals so that you have to double all of them. If you thought regular expressions where hard to read in the first place, wait until you see them in Java:
Pattern emailRegEx =
Pattern.compile("(\\w+)@(\\w+\\.)(\\w+)(\\.\\w+)*");
Late insight into a small language detail: final fields
You can declare Java
fields as final without immediately giving them a value,
if you make sure that a value eventually gets assigned
in all of the class' constructors.
class Test{
private final String name;
Test(){
name = "default";
}
Test(String name){
this.name = name;
}
}
This prevents two types of coding errors:
- You cannot accidentally re-assign
the value of
name, - and the compiler also makes sure that you actually do assign a value for all code-paths through all constructors.
Java4K is a web site that showcases compact arcade games, all of them written in Java with a maximum (jar file) size of 4KB. My current favorite is jm00 - a boomshine clone.
I have been known to put forth a thousand and one reasons to love Perl, and an equal number of things that I want to see fixed in Java. Well, for the last three years I have professionally been programming in Java exclusively, and even more telling, I write Java code in my spare time, too. So I suppose it is time to come clean and update the official party line.
- I still maintain that a team of skilled developers can be more productive using dynamic languages such as Perl (or Python, or Ruby), and that they will enjoy it more, too.
- I also concede that for a development environment that consists of bigger teams with significant member turnover and programmers of varying skill levels, the restrictions on style and the relative verbosity that Java enforces can be very helpful to ensure a minimum code quality.
- While Java development requires a lot of tooling that a dynamic programmer could do without, once you get used to the tools, which are now very mature and numerous, they can do amazing things for you, most of which depend on deriving useful information from the Java code, something that is very hard to do in a dynamic language.
- Much of the boilerplate traditionally necessary in Java has been eliminated with the release of Java 5 and EJB 3. Most of what is left can be automatically created by tools.
- Regardless of the Java language, the Java Virtual Machine, especially Sun's Hotspot VM, is a remarkable piece of technology (I hear that Microsoft's Common Language Runtime deserves an equal amount of respect). It is not to Perl's advantage that both Python and Ruby, as well as a couple of hot new languages like Scala, can run on top of the JVM, whereas Perl does not.
- I like Sun Microsystems themselves, especially since they have started embracing the open source model in earnest.
- I still don't like PHP.
A Programming Note concerning Thilo's Tech Radio: To celebrate the occasion let us all listen to an excellent episode of the Java Posse. The recording quality is a bit poor, but the content is excellent.
In order to include programmes from outside the IT Conversations Network, I had to move the RSS file to my own server, so please re-subscribe to the new feed (put together using the very promising Spokenword site, currently in public alpha).
Now that I managed to log in to my OTN account, here are the results
of Saturday's test suite for Oracle XE on Windows XP:
| |||||||||||||||||||||||||||||||||||||||||||||||||
|
- With Oracle, using prepared statements makes a lot of difference, going from interpolated variables to bind variables more than doubles the throughput, and reusing the same statement adds another ten percent. This is good news in more than one way, because that first part (low-hanging fruits for a programmer) brings such a big gain that you can argue against the need for the extra few percent fro the second improvement, which is trickier to implement in a general fashion (although you could turn on statement-caching in the driver, I need to try to measure that some time).
- Using the batch-update interface when applicable gives a spectacular boost, in this case it is about 15 times faster. Further testing is needed to how this plays out with different batch sizes, specifically if there are upper and lower limits for when it makes sense to use the feature.
- As for how much time it takes for getting a connection from the pool, it depends if you turn on the validation feature of the pool, which checks if the connection is still alive before giving it out. With validation turned off, there is basically no overhead, with validation it adds a few milliseconds every time you get a connection, in my case (I only tested this with Oracle, the times are not included in the charts) one to two ms.
After these measurements for a thousand updates, I also took timings for a different scenario:
- SELECT non-existing row
- INSERT the row
- SELECT again
- UPDATE the row
- SELECT again
- DELETE the row
- SELECT the now missing row again
This pattern was run in two variations (as shown above and without the selects) in two different implementations (using bind variables or not using them). Each of these four routines was run interleaved (ABCDABCD...) for a total of 101 times, with the first iteration results discarded, and the times it took for each iteration becomes the benchmark result. The connection was in auto-commit mode the whole time.
|
Again, we see prepared statements making a big difference on Oracle, not so much (even a slight slow-down?) on the open source databases, and that MySQL suffers because of the slow commits (of course, it should still be fast enough, that part is unlikely to become the bottleneck).
Potential follow-ups to this would be to properly profile the connection pool's validation feature, to include Hibernate into the mix and measure its overhead, to record the strain on the server, and to use multiple threads to see how bind variables affect scalability. But I promise that if I do that, I will not bore you with the results here on my blog (one thing that I do want to put here, though, are the results of running these two benchmarks on the same machines in Perl instead of Java).
What is the overhead of getting a fresh
connection from
the connection pool instead of passing the connection around?
How much faster are repeated SQL statements when using a fixed
query string with bind variables as opposed to directly interpolating
the data into the query string? How much faster when re-using
the same prepared statement? How much faster when using a batched update?
I ran a little benchmark.
- A) 1000x [getConnection createStatement executeUpdate commit]
- B) 1000x [getConnection prepareStatement executeUpdate commit]
- C) getConnection prepareStatement 1000x [executeUpdate commit]
- D) getConnection prepareStatement 1000x [executeUpdate] commit
- E) getConnection prepareStatement 1000x [addBatch] executeBatch commit
I wanted to test Oracle XE on Ubuntu, but did not get either installed
(the eMachine did not like the Ubuntu CD, and Oracle's web-site was unresponsive),
so I went with Postgresql 8.3 and MySQL5(InnoDB) instead. The databases were
running on Windows XP, both fresh installs using the default settings,
accessed from the Java test program on a Mac mini via local ethernet network.
|
- Commits against MySQL are amazingly slow. I assume that this is a problem with my setup, or with Windows. This also probably only affects the transactional InnoDB backend.
- With MySQL, there is no speed difference between methods A, B, and C, and hence no visible performance advantage to prepared statements. Maybe the JDBC driver does not implement the feature. With Postgresql it seems to improve throughput, but not by much. The Oracle figures should be interesting here.
- Committing only once instead of separately after every update makes a big difference, especially with MySQL (see above). Of course, performance considerations should not be a factor in deciding what a transaction is.
- Bulk updates give another big boost to Postgresql, not so much to MySQL.
Probably the most acclaimed technology to come out of Sun Microsystems recently is
DTrace, the dynamic tracing framework for Solaris (which Apple also ported for and included in Leopard).
DTrace allows to inject so-called probes into the running operating system and applications
in order to gather data to tune or troubleshoot them. Probes are written in a simple programming language
called D, creating probes requires no changes to the code being probed, and
DTrace support has no performance impact unless the probes are actually enabled.
I really want to have something similar for Java, in order to troubleshoot tricky support issues at customer sites. At least a simple version (with only JVM-level and no application-specific instrumentation points) should be possible using the existing JVM tracing and debugging interfaces. I picture something like inserting a little script that fires when a specific method is called with a specific set of arguments and then logs a message or starts a timer or increases a counter. Half a day of searching did unfortunately not turn up an easy-to-use and painless-to-deploy tool, although I am sure there must be something.
As it is, we have to rely on logging, the catch being, of course, that appropriate trace messages need to be in the code in the first place, which more often than not they are not, requiring a patch release with all the costs and delays associated to it. And even if the messages are there, they of course only appear when trace logging is enabled, and even when only enabling tracing for the bare minimum of log categories required, there will usually be a lot of output unrelated to the problem at hand that need to be manually filtered.
Update: I am aware that DTrace is available on the latest JVM for Solaris and Mac OS X, but that will not help me with Java5 on RedHat, which is what we deploy on. I was looking for a Javaland solution that works with older/other JVM/OS.
I started using Apache Maven and am really impressed by it. At first I was suspicious as to why we need another build system when we already have Apache Ant, but as it turns out Maven is doing much, much more than Ant, and is also easier to use. No wonder that basically all popular Java (it does seem to be limited to the Java world) projects are using it now.
First of all, it has a starter command that generates the
project definition XML files, which defines what kind of project
you have and how it has to be built. With Ant, you have to
manually create a build.xml, which requires you
to get down and study a little first. Plus, an initial build.xml
probably does not do very much, whereas the initial pom.xml
can even create a nice looking web presence.
Secondly, Maven seems to believe in Convention over Configuration
.
There are sensible defaults for everything, so if you adhere to
established directory layouts and such, you can get results without
any fiddling of parameters. A good example are the various reporting
plugins, that create JavaDoc, or coverage reports, or dependency listings.
Just drop in the plugin and without the need to specify source or
output directories, or even to activate the new build step, you get
the extra side-menu items in your report page.
Thirdly, and probably most importantly, Maven takes care of dependencies
for both your project and itself. It automatically locates, downloads and
installs all those extra JAR files, and makes sure the versions match up.
If it is not already bundled with your OS (comes with Leopard for example),
then installing Maven could be the last Java-related download you need to do yourself.
When I decided I wanted Cobertura coverage reports yesterday, it was just
a matter of adding two lines of code to pom.xml. Maven immediately
downloaded the Cobertura plugin for itself, plus all required libraries.
I remember it being much more work when I integrated Cobertura in our Ant
build chain. The same goes for the dependencies of the project that is being built.
Need Hsqldb for some junit testing? Just add it as a Maven dependency (using a convenient
package name search dialog in Eclipse), and that's it. No need anymore to either put big third-party
JAR files in your own project's version control system or have all developers find
and install them manually.
Sun Certified Programmer for Java 5
class Bird {
{ System.out.print("b1 "); }
public Bird() { System.out.print("b2 "); }
}
March is the end of the fiscal year and thus time for annual performance reviews. To beef up my sheet I took the SCJP exam today. Now this is an exam you want to prepare for, even if (or maybe especially if) you have to use Java on a daily basis. The test is a computerised multiple choice test, delivered at a Prometric Testing Center, same as with the Oracle exams. Prometric seems to be doing really well, they have hundreds of exams for dozens of subjects, lots of offices all over the place, the exams are not cheap, and there are always plenty of other people when I go. Being an automated test (you get your score immediately afterwards) makes it necessary for clear-cut right and wrong answers. But the wrong answers cannot be too obviously incorrect, otherwise the exam would be too easy. As a result, things are quite tricky, you have to really pay very close attention to (often terribly contrived and not at all real-life) details, and the exercise somewhat resembles a puzzle or quiz: It is kind of fun, in the same way Sudoku is fun. Just do not expect to survive on your school math skills alone.
class Raptor extends Bird {
static { System.out.print("r1 "); }
public Raptor() { System.out.print("r2 "); }
{ System.out.print("r3 "); }
static { System.out.print("r4 "); }
}
In particular, you need to be prepared to watch out for things like
- incorrect indentation to confuse you (which is normally taken care of by the IDE's code formatter),
- funky interpunctuation that causes errors (which the compiler would catch for you),
- calls to core API methods that do not exist or with the wrong parameters (again: compiler),
- defining different variables in different scopes using the same name,
- calling massively overloaded methods where it is not all obvious how the combination of widening, auto-boxing, and varargs would affect the dispatch.
class Hawk extends Raptor {
public static void main(String[] args){
System.out.print("pre ");
new Hawk();
System.out.println("hawk ");
}
}
You get 72 questions and three hours (which is plenty, I finished in about half that).
What is the result? A) pre b1 b2 r3 r2 hawk B) pre b2 b1 r2 r3 hawk C) pre b2 b1 r2 r3 hawk r1 r4 D) r1 r4 pre b1 b2 r3 r2 hawk E) r1 r4 pre b2 b1 r2 r3 hawk F) pre r1 r4 b1 b2 r3 r2 hawk G) pre r1 r4 b2 b1 r2 r3 hawk H) The order of the output cannot be predicted. I) Compilation fails.
The thousand and one things I want to see fixed in Java: [1] Method meta data without reflection
The main merit of Java being a compiled and statically typed language is that a lot of useful information is available to the compiler. In Eclipse, for example, you can click on any method to get a call hierarchy that lists all calls to that method within your workspace. This is extremely useful (almost essential actually, given the great complexity that Java projects always seem to reach very quickly) to assess the impact of bugs, code changes or refactoring. Unfortunately, the call hierarchy does not include calls using the reflection API. In most cases (such as access from Javascript or Velocity templates), that is just something we have to live with (and resort to full-text search to try to find the callers), but I would like to have better support here when using reflection from inside of Java.
Consider how you access class meta data: If you do not know the class name at compile-time, you have to do
Class c = Class.forName(unknownClassName);and that obviously leaves compiler and IDE in the dark. But if you do know the name, you can write
Class c = my.package.MyClass.class;which can easily be tracked down later.
For methods, however, you only have the first option:
Method m = clazz.getMethod(methodName, parameterTypes);There is just no syntax to do the Java equivalent of
my $m = \&Clazz::methodName;as a result of which I find myself writing silly wrappers.
/*
to get the Java version of
my @names = map { $_->name() } @objects;
*/
// instead of (works, but has poor IDE support because of reflection)
FilteringIterator<SomeClass, String> it =
new ReflectionFilteringIterator<SomeClass, String>( "getName");
// make a wrapper that does not use reflection
FilteringIterator<SomeClass, String> it =
new FilteringIterator<SomeClass, String> {
public String filter(SomeClass in){
return in.getName();
}
}
// so that I can write
List<String> names = it.map( (List<SomeClass>) objects);





