Benchmarking batch JDBC queries

What happened?

Initially we used queries like this (all examples are for scalikejdbc but it doesn’t really matter):

case class Entity(id: String, val1: String, val2: Boolean, val3: Array[Byte]) def add(entity: Entry): Unit = autoCommit { implicit session =>
sql"""
INSERT INTO tbl (id, col1, col2, col3)
VALUES (${entity.id}, ${entity.val1}, ${entity.val2}, ${entity.val3})
"""
.update()
.apply()
}
def remove(id: String): Unit = autoCommit { implicit session =>
sql"""DELETE FROM tbl WHERE id = $id"""
.update()
.apply()
}

Batching

Batching is an attempt to reduce number of round-trips from application to a database. So, what driver does is — instead of sending a bunch of separate queries, it sends it this way:

INSERT INTO tbl (...) VALUES(...);
INSERT INTO tbl (...) VALUES(...);
def add(entities: Seq[Entity]): Unit = localTx { implicit session =>
val params = entities.map { entity =>
Seq(
"id" -> entity.id,
"val1" -> entity.val1,
"val2" -> entity.val2,
"val3" -> entity.val3,
)
}
sql"""
INSERT INTO tbl (id, col1, col2, col3)
VALUES ({id}, {val1}, {val2}, {val3})
"""
.batchByName(params: _*)
.apply()
}
def remove(ids: Seq[String]): Unit = localTx { implicit session =>
val params = ids.map(v => Seq(v))
sql"""DELETE FROM tbl WHERE id = ?"""
.batch(params: _*)
.apply()
}

Better DELETE (IN clause)

Actually, for delete we can do much more robust thing:

def remove(ids: Seq[String]): Unit = autoCommit { implicit session =>
val inClause = SQLSyntax.csv(ids.map(id => sqls"""$id"""): _*)
sql"""DELETE FROM tbl WHERE id IN ($inClause)"""
.update()
.apply()
}

rewriteBatchedStatements

I stumbled upon this parameter on StackOverflow and instantly decided to check how it works. Basically, it rewrites INSERT queries in multi-value queries:

INSERT INTO tbl (...)
VALUES (...), (...), (...)

Benchmarks

First, I verified that rewriteBatchedStatements actually works by enabling profile logs in driver, it showed that final queries. Second, I built a very simple benchmark that tests different flavors of batching for INSERT and for DELETE with different number of entries. Each entry in my test is about few hundreds of bytes, multiple columns, nothing fancy.

INSERT

As you may see in benchmark results below, multi-value INSERT is really the fastest.

Benchmark    (numberOfEntities)  Mode  Cnt     Score   Error  Unitsbatch                         1  avgt    2    12.797          ms/op
batchRewriting 1 avgt 2 11.769 ms/op
oneByOneInTransaction 1 avgt 2 12.624 ms/op
oneByOne 1 avgt 2 12.184 ms/op
batch 10 avgt 2 13.433 ms/op
batchRewriting 10 avgt 2 11.835 ms/op
oneByOneInTransaction 10 avgt 2 15.592 ms/op
oneByOne 10 avgt 2 125.161 ms/op
batch 100 avgt 2 29.763 ms/op
batchRewriting 100 avgt 2 22.480 ms/op
oneByOneInTransaction 100 avgt 2 35.664 ms/op
oneByOne 100 avgt 2 1281.417 ms/op
batch 1000 avgt 2 213.938 ms/op
batchRewriting 1000 avgt 2 148.009 ms/op
oneByOneInTransaction 1000 avgt 2 229.646 ms/op
batch 10000 avgt 2 2027.138 ms/op
batchRewriting 10000 avgt 2 1497.429 ms/op
oneByOneInTransaction 10000 avgt 2 2321.587 ms/op

DELETE

In case of DELETE the option rewriteBatchedStatements shouldn't affect anything. However, it's still better than regular batching.

Benchmark  (numberOfEntities)  Mode  Cnt     Score   Error  Units
batch 1 avgt 2 21.636 ms/op
batchRewriting 1 avgt 2 15.237 ms/op
inClause 1 avgt 2 13.483 ms/op
oneByOneInTransaction 1 avgt 2 10.938 ms/op
oneByOne 1 avgt 2 12.273 ms/op
batch 10 avgt 2 16.328 ms/op
batchRewriting 10 avgt 2 14.396 ms/op
inClause 10 avgt 2 11.184 ms/op
oneByOneInTransaction 10 avgt 2 13.085 ms/op
oneByOne 10 avgt 2 124.575 ms/op
batch 100 avgt 2 21.893 ms/op
batchRewriting 100 avgt 2 17.696 ms/op
inClause 100 avgt 2 13.029 ms/op
oneByOneInTransaction 100 avgt 2 24.492 ms/op
oneByOne 100 avgt 2 1181.656 ms/op
batch 1000 avgt 2 104.244 ms/op
batchRewriting 1000 avgt 2 83.070 ms/op
inClause 1000 avgt 2 25.444 ms/op
oneByOneInTransaction 1000 avgt 2 130.383 ms/op
batch 10000 avgt 2 925.338 ms/op
batchRewriting 10000 avgt 2 854.990 ms/op
inClause 10000 avgt 2 167.237 ms/op
oneByOneInTransaction 10000 avgt 2 1254.676 ms/op

Conclusion

The way to achieve the best performance with a database is to use the least amount of queries. In case of INSERT it's a multi-value query, in case of DELETE it's an IN clause with multiple identifiers specified. For DELETE queries it's on us to write it properly, and for INSERT queries there is a very nice driver option that converts your batch query into multi-value query and boosts the performance auto-magically!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dmitry Komanov

Dmitry Komanov

251 Followers

Software developer, moved to Israel from Russia, trying to be aware of things.