How To Cancel A Blocking Read In Java
Update: The ZombieInputStream I've talked about, now renamed to RevivableInputStream, is available in this Java library I've created: com.hypirion/io. It should be fairly straightforward to use if you came here looking for a solution on how to cancel blocking reads.
Working with IO sounds easy enough, and many people use the interface Java has defined every single day. When doing "normal IO" in Java, this interface works very well, except for some minor issues. But some minor issues may turn into gigantic rabbit holes if you're working with specific stuff for specific programs, and I'm about to show you one of them.
In the project manager Leiningen, we must spawn subprocesses to avoid that Leiningen itself mess up the Java classpath you use for your Clojure/Java project. This would happen whenever someone's working on a project which depends on a library Leiningen uses, but use a different version. Either Leiningen would complain, or the project itself would complain, because some of them is now using the wrong version.
To create subprocesses in Java, you use the Runtime's
exec-method (For example
this one). This returns a Process, where you can call specific
methods to get its sysin, sysout and syserr. To pipe the sysout and syserr to
the main process' sysout and syserr is very straightforward: Set up a new
thread, read data by blocking, and pipe it to System.out/err until we get EOF from the InputStream.
Reading Without Blocking
However—as I've found out—the System.in part is not as straightforward: We initially did this the same way as the System.out/err part: Fire up a thread, and while System.in had data and the subprocess was alive, pipe the data to the subprocess. However, when we had multiple subprocesses in sequence where both subprocesses needed to read from System.in, we got an issue: The first subprocess would eat some of the input of the second one because it would do a blocking read, check if the process was still alive, then push the data onto the inputstream if it was. If it wasn't, then it just threw away the data. So when Googling won't find you any apparent solution on how to cancel a blocking read, you has to experiment with different techniques instead.
One possible solution to this is to just do busy-waiting on the data
instead, using the
available method from
InputStream on System.in to check if there's any data, and sleep some
milliseconds before retrying if there isn't any data. This works well, and
solves the solution…
EOF Is Not A Byte
…if we don't have to send EOF (also known as
-1) to the process. System.in
is/wraps a FileInputStream using Filedescriptor.in, and
FileInputStream "[r]eturns an estimate of the number of remaining bytes that can
be read (or skipped over) from this input stream without blocking". The EOF is
not accounted for in that estimate, so whenever we have a closed or empty
stream, we will return 0. It makes sense, as EOF is not really a byte. However,
now we have to do a blocking read in order to close their System.in. So now, the
issue is that we cannot busy-read and actually read EOF—we have to block in
order to read.
We can resolve this through a PushbackInputStream and a blocking queue: We make a blocking queue where we put the output stream we should pipe to, along with a Clojure atom which contain true of the process has ended, and false if it hasn't. In the thread, we read, check if the process is alive, and if it is alive, pipe the data out to it. Otherwise, we push the data back into the stream and move on to the next subprocess in the blocking queue and send the data to that one. For sequential subprocesses, this turn out to work just fine…
The REPL Is Calling, She Wants Her Bytes Back
…until you figure out that
System.Console.readPassword or some Clojure REPL
like REPL-y is suddenly not working properly. You see, they're not running in a
subprocess. So, whenever they read from sysin, they don't read from our
convenient PushbackInputStream. Now you have a PushbackInputStream with some
bytes in it that should have been at System.in.
Java has a convenient method called System.setIn though, which allows you to
rebind System.in. But of course,
System.Console.readPassword nor REPL-y
doesn't use System.in. Oh no, they use a FileInputStream wrapping
FileDescriptor.in. And even if they read from System.in, we would have the
issue that two streams are trying to read from the same source: This is ugly
because now thread S (subprocess) and thread R (repl) wants to read from
System.in. Since S read first, it will get the first byte. We then get a race
condition: Either R read a byte, and then S will push back its byte, or the
order of those actions will be reversed. This means that sometimes, the two
first bytes will be swapped, and sometimes they
Interrupting A Thread?
There are some more tricks we could try to apply here. If we just create a new FileInputStream on top of Filedescriptor.in, and just close the InputStream in the main thread when the process has finished, we should certainly get this to work, right? Well, it turns out that closing a FileInputStream wrapping Filedescriptor.in closes Filedescriptor.in as well. So that option's not working.
Another way of solving this would be to just interrupt/kill the thread reading from standard in. Well, there's this 12 year old unresolved bug report which explains that it is impossible to kill or interrupt a blocking thread without closing the stream it tries to read from.
And even if this was working, how could we be certain that we're interrupting the thread when it's blocking on a read? If we had a larger thread with some environment we would like to keep, how could we save that environment and e.g. restart or continue? Interrupting a thread just to cancel a blocking read is not exactly graceful.
Revivable Input Streams
One actual, possible solution is make a "zombie" input stream which you can "close" and "revive". While it's dead, it spits out -1. Otherwise it tries to read from an underlying stream, blocking until the underlying stream provides some data, is closed or someone kills this stream-wrapper. It will not immediately solve those REPL-y and readPassword-issues, but if we're allowed to specify what stream they read from, then this solution works.
I will not go into the implementation details of the ZombieInputStream (Mainly because I've not finished implementing it), but trust me that it's not entirely straightforward, and that it require use of locks/monitors/threads. All this, just because I need to be able to cancel a read from System.in gracefully.
Good Design, Good API
So with this little anecdote on how one handle interruption of blocking reads in
Java, this begs the question: How should you do nonblocking reads properly?
available return 1 when we're at EOF? Should it return -1 when we're at
EOF? Should there be a method so that you can check whether the InputStream is
closed or not? Or should Java "simply" have a nonblocking read, returning -2
when it's not possible to read a value immediately, and -1 at EOF?
I could probably go on with ideas on how to make it easier to solve my specific problem. However, that's not really what one should consider when creating APIs and Designs for input and output: The most important part is that the components are composable and that it makes programs simple, and not neccesarily easy.
My solution to the problem does not compose with other parts. However, in this
case REPL-y did not (initially) support the option to specify InputStream, and
System.Console.readPassword makes it easy to read passwords from the
terminal, is tightly coupled with the stream it is reading from. As such, one
may question the composability of those parts, and one may question whether Java
makes it easy to create such composability.
That's not to say that Java's IO API is necessarily bad, but there is at least one edge case which is a bit tricky to get right. Maybe it wasn't designed for the liberal way we use it, and that perhaps design tradeoffs make that choice justifiable. Maybe it's just bad design. Someone more proficent on IO API design should answer those questions, as I have neither designed an IO API nor studied Java's in detail.
 There's some synchronization between those threads so that printing doesn't get garbled in a terminal or if the file descriptors are merged into one.
 Busy-reading is in general bad design, but for Leiningen it was okay as the bottleneck wasn't there.
 This is assuming we're reading one and one byte. If we're reading into buffers, then we may get larger chunks moved around.
Tagged with: java, bad design.