The T-Files


Wed, 12 Mar 2008

More JDBC Microbenchmarks

Now that I managed to log in to my OTN account, here are the results of Saturday's test suite for Oracle XE on Windows XP:

run0[ms]run1[ms]run2[ms]run3[ms]updates/sec

Oracle

A3490338233393402296

B1578147814441452686

C1272125312411474756

D6606606576591518

E4745434123256

  • With Oracle, using prepared statements makes a lot of difference, going from interpolated variables to bind variables more than doubles the throughput, and reusing the same statement adds another ten percent. This is good news in more than one way, because that first part (low-hanging fruits for a programmer) brings such a big gain that you can argue against the need for the extra few percent fro the second improvement, which is trickier to implement in a general fashion (although you could turn on statement-caching in the driver, I need to try to measure that some time).
  • Using the batch-update interface when applicable gives a spectacular boost, in this case it is about 15 times faster. Further testing is needed to how this plays out with different batch sizes, specifically if there are upper and lower limits for when it makes sense to use the feature.
  • As for how much time it takes for getting a connection from the pool, it depends if you turn on the validation feature of the pool, which checks if the connection is still alive before giving it out. With validation turned off, there is basically no overhead, with validation it adds a few milliseconds every time you get a connection, in my case (I only tested this with Oracle, the times are not included in the charts) one to two ms.

After these measurements for a thousand updates, I also took timings for a different scenario:

  1. SELECT non-existing row
  2. INSERT the row
  3. SELECT again
  4. UPDATE the row
  5. SELECT again
  6. DELETE the row
  7. SELECT the now missing row again

This pattern was run in two variations (as shown above and without the selects) in two different implementations (using bind variables or not using them). Each of these four routines was run interleaved (ABCDABCD...) for a total of 101 times, with the first iteration results discarded, and the times it took for each iteration becomes the benchmark result. The connection was in auto-commit mode the whole time.

[ms/run]PostgresqlMySQLOracle

Insert, update, delete (no binds)4.8328.8

Insert, update, delete (binds)5.3325.3

Plus selects (no binds)9.93714.6

Plus selects (binds)10.3369

Again, we see prepared statements making a big difference on Oracle, not so much (even a slight slow-down?) on the open source databases, and that MySQL suffers because of the slow commits (of course, it should still be fast enough, that part is unlikely to become the bottleneck).

Potential follow-ups to this would be to properly profile the connection pool's validation feature, to include Hibernate into the mix and measure its overhead, to record the strain on the server, and to use multiple threads to see how bind variables affect scalability. But I promise that if I do that, I will not bore you with the results here on my blog (one thing that I do want to put here, though, are the results of running these two benchmarks on the same machines in Perl instead of Java).