The Cost of Streaming Data from MySQL

What is streaming

By default, MySQL Java drivers retrieve the data from the database all at once, and you get a ResultSet with all rows inside. On the MySQL documentation, it mentions the possibility of not retrieving data all at once — this is called streaming.

Here’s how it works:

// conn is java.sql.Connection
val conn = getConnection(driver)
val st: java.sql.Statement = conn.createStatement(
java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY
)
st.setFetchSize(Integer.MIN_VALUE)

val rs: java.sql.ResultSet = st.executeQuery("SELECT * FROM table WHERE id = 123")

When is it applicable?

There are many use cases where you can use streaming besides data streaming: data migrations, data export, batch processing (when you need to provide rows one by one), and one particularly interesting to me — Event Sourcing.

So… Let’s test!

Considering all the benefits, if we could determine that the cost isn’t very high, streaming would be a good solution for querying data. But in order to test the performance of streaming, we need to have something to compare it with.

How to test?

Let’s start with the actual test code. Have a look at the test code for default querying (selecting everything at once):

def selectAtOnce(driver: MysqlDriver, limit: Int): List[TestTableRow] = {
val st = getConnection(driver).createStatement()
val rs = st.executeQuery(s"SELECT id, name FROM $TableName LIMIT $limit")
// at this point in rs we have all rows!
val result = mutable.ListBuffer[TestTableRow]()
while (rs.next()) {
result += mapRow(rs)
}
result.toList
}
def selectViaStreaming(driver: MysqlDriver): List[TestTableRow] = {
val st = getConnection(driver).createStatement()
st.setFetchSize(Int.MinValue)
val rs = st.executeQuery(s"SELECT id, name FROM $TableName LIMIT $limit")
// in rs we don’t have any data until the next() method call
val result = mutable.ListBuffer[TestTableRow]()
while (rs.next()) {
result += mapRow(rs)
}
result.toList
}

What To Test

Taking into account we have more than one driver for MySQL server, results may vary. After reading another performance test from MariaDB blog, I decided to test all three available drivers: ConnectorJ, MariaDB, and Drizzle, but when I started testing, I discovered that Drizzle doesn’t support streaming — so only ConnectorJ and MariaDB remained relevant.

Results

Let’s take a look at the benchmark results.

Local

Times grows linearly with rows count. Apparently, there are no network problems — traffic on localhost is very fast.

Local, microseconds

Wi-Fi (slow network with latency)

Since the network is unpredictable, it’s hard to see any reasonable correlation between streaming and non-streaming. However, in general, times grows linearly as well.

Wi-Fi (slow network simulation), microseconds

Wire

Here we see a stable difference between streaming and non-streaming. The performance of ConnectorJ’s atOnce is the best. The difference between streaming and non-streaming doesn’t seem to be significant. ConnectorJ’s difference for streaming decreases with more rows: from two times degrade with small amount of rows, to ten percent for hundreds of rows. Here we see a substantial difference between streaming and non-streaming.

Wire, microseconds

Conclusion

As expected, streaming performance has a strong correlation with network bandwidth and latency. In the case of a local MySQL installation, the difference is insignificant, and times grows linearly with the row count for both drivers. For slow networks, it’s harder to distinguish between streaming and non-streaming, since network latency is unpredictable. In fast networks, the difference could indeed reach up to twice as long — but it decreases with the row count.

Testing Environment (or, how I tested it)

If you, like me, are interested in reviewing the test environment, here are a few words on the matter: I used JMH tool for running benchmarks, and all the results are pretty reproducible. I ran JMH benchmarks from a laptop with i7 2.6 GHz (2 cores) and 16GB of RAM. For wire configuration, I used AWS cloud with two micro instances (one for the server, and one for JHM benchmark).

Extra

What I decided not to put in the original post is one annoyingly strange graph that I got several times on a wire (like 2 or 3 times in a row). But then I failed to reproduce it (I tried many times), but without success. JFYI…

weird case on wire, microseconds

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dmitry Komanov

Dmitry Komanov

251 Followers

Software developer, moved to Israel from Russia, trying to be aware of things.