I am using MySQL and want to utilize the setFetchSize
property. The default MySQL JDBC implementation does not really respect it. If you set fetchsize to Integer.MIN_VALUE
it will fetch each row individually, but considering the reason I want to use fetchSize is that I have enough data to put my memory usage into the 2 G range having to do one query per row is going to take forever.
I would like to instead plug in a JDBC implementation that will work with MySQL and properly respects fetch size, allowing me to set a fetchsize of 10,000 or some other higher limit. Can anyone point me to a jar that may provide such an implementation? failing that is there any other resource to allow me to reasonable do a query containing tens of thousands of entries in a manner that is efficient, but in memory and number of sql queries required.
Technically questions asking for libraries are off-topic. That said, as far as I know there is no alternative driver for MySQL. You either have the choice between getting all rows which might lead to out of memory situations, or to have the driver fetch them on demand by setting setFetchSize(Integer.MIN_VALUE)
.
The reason for this - as I understand from the Connector/J implementation notes - is that the MySQL protocol cannot have more than one cursor open per connection, therefor it defaults to streaming all rows to the client on execute.
The other option is that rows are retrieved one-by-one, but this comes with the problem that you cannot execute other statements on the same connection while processing the ResultSet
:
There are some caveats with this approach. You must read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.
So the MySQL only has the option to get everything or to get one at a time. This means that there is no way for a driver to respect a different fetch size. And due to the caveats when getting one-by-one they opted to use the Integer.MIN_VALUE
(instead of simply 1
) as a signal that you should really think before doing this.
A possible 'in-between' solution would require you to program this yourself using LIMIT
and OFFSET
and repeatedly executing queries.
If you enable the MySQL JDBC option useCursorFetch
, fetchSize will indeed be respected by the driver.
However, there is one disadvantage to this approach: It will use server-side cursors, which in MySQL are implemented using temporary tables. This will mean that results will not arrive until the query has been completed on the server, and that additional memory will be used server-side.
If you just want to use result streaming and don't care about the exact fetch size, the overhead of setFetchSize(Integer.MIN_VALUE)
is not as bad as the docs might imply. It actually just disables client-side caching of the entire response and gives you responses as they arrive; there is no round-trip per row required.
This is not really an answer to the above question. As I could not fit it in comment, I went to provide it as an answer. It may prove helpful to some facing a similar issue.
For a batch job, I needed to switch on the streaming mode as my result set was too large. At first, as seen in the MySQL doc, I set my connection up this way:
Statement extrapackStreamingQuery = dbExtrapackConnection.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
extrapackStreamingQuery.setFetchSize(Integer.MIN_VALUE);
But it would systematically give me the error:
Application was streaming results when the connection failed. Consider raising value of 'net_write_timeout' on the server.
I did try a few configuration options, like: max_allowed_packet = 128M
, max_connect_errors = 9999
and net_write_timeout = 180
. But none of them helped.
Wrongly thinking the TCP connection could be closed for being idle too long, I even tried changing the TCP ping time frame with a: net.ipv4.tcp_keepalive_time=60
in the /proc/sys/net/ipv4/tcp_keepalive_time
and /etc/sysctl.conf
files.
Indeed, if a database connection is opened but no TCP packets are sent for long enough, then the database connection will be lost as the TCP connection is closed. Sending TCP packets more often to keep the TCP connection alive may solve the issue.
But this didn't help either.
Then, after reading this piece I change my connection setup to:
protected static final int DB_STREAMING_FETCH_AMOUNT = 50;
...
Statement extrapackStreamingQuery = dbExtrapackConnection.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
extrapackStreamingQuery.setFetchSize(DB_STREAMING_FETCH_AMOUNT);
with my url using a trailing option:
String fullUrl = url + host + ":" + port + "/" + dbName;
if (streaming) {
fullUrl += "?useCursorFetch=true";
}
My batch job is now working fine, it completes and even runs faster.