Linux Yoke driver CVS Tree
This is a WWW interface to my (1999) Linux Yoke driver CVS tree.
You can download the latest source tar from ftp://oboe.it.uc3m.es/pub/Programs/yoke-current.tgz.
The yoke driver is intended to do hot-swappable transparant binding of one linux block
device to another. So if you write into /dev/hda1 (for example) you also
write into /dev/hda2 (for example).
This differs from RAID-style mirroring. RAID
presents one mirrored device and the yoke device presents two, or more.
You can hotswap components in and out of a yoke device, because it isn't
really there ... it's just a binding. If you still say use RAID, well,
I'm willing to steal the code from there too.
The driver is experimental.
Read the notes at the head of yoke.c for instructions on how to use it,
and read the Makefile for more clues. The mode of use has changed sometimes
from version to version. It is presently stable.
Hints and history
-
I conceived this device as a node device in a topology of network 1-1 block device
connections.
-
I tried at first to use a trick that I believe has been used on Solaris before
now: the reuse without copying of a kernel level block request. Take a
nearly completed request and retarget it just before the kernel thinks
it ought to destroy the request, after carrying out the request. Repeat
as necessary. Using this trick ought to enable one to get mirroring virtually
for free.
-
I've had trouble making it work in 2.2.* because of what seems to be
a rather varied manner of implementing block devices in the kernel. Had
it working on IDE, then on SCSI, but not at the same time. Currently using
loop. Kernel mechanisms also changed from 2.0.*, where I did have it going.
-
Under 2.2.* I've just (0.3.1) managed to find a suitable alternative mechanism.
I'v hijacked the code from the loop driver. It copies instead of moves, but
in principle the expense is not too great since it's all asynchronous. The
code works in the block cache layer. It gets a buffer_head for the target
device and target zone and then copies the incoming request data into the new
buffer_head's buffer. It does end_io on the original request and leaves the kernel
to it.
-
I have hijacked and modified the real loop.c code (see the loop.c here) for
trialling so that I can control the environment better.
-
I moved the generic kernel end_request() code into loop.c via LOCAL_END_REQUEST,
and
-
I stopped it doing a change of CURRENT as a side-effect, since it was driving
me crazy trying to remember if it had moved or not. CURRENT is changed
now once at the beginning of the loop.c request loop.
-
I haven't made any other changes in loop.c. If Ted Ts'o reads this, please
explain to me why he originally moved CURRENT once at the beginning of
the request loop, then restored it at the end of the loop, then moved it
again in end_request() just before loop repeat
-
The existing code demonstrates that stealing the yoked block devices request_fn
works. The original (2.0.*) scheme was as follows:
-
We substitute our own, like a cuckoo. When a block device tries to treat
its accumulating requests, it executes our cuckoo function ...
-
our function marks the individual buffer heads by putting in a special
end_io() function into their b_end_io fields. That's another cuckoo.
-
Then it runs the original device request_fn so that nobody will spot anything.
-
When the kernel has treated the requests, the default kernel end_request
function and hopefully any special local versions too runs the end_io function
in the buffer_head, and thus hands back control to us.
-
Instead of freeing up the buffer, we retarget the buffer head as necessary
and resubmit it as a new request, thus doing mirroring.
-
When we get the buffer head back for the umpteenth time, then we
finish up i/o as originally intended.
-
The new 2.2.* scheme takes the first part of the 2.0.* scheme and finishes
up in a different way, like this:
-
We substitute our own request_fn, like a cuckoo. When a block device tries to treat
its accumulating requests, it executes our cuckoo function ...
- Our function passes unmarked write requests to our special queue
- Then it runs the original device request_fn so that nobody will spot
anything.
- We run through our queue and copy the requests to the yoked devices,
using the block-device layer (function getblk) and adding a marker.
- We discard the original request and leave the new requests to drop
through to the real devices.
Outstanding doubts of mine
-
I don't really know what is needed to make the buffers commit themselves
just before I retarget the buffer heads. Should I do a wait_on_buffer to
keep them marked? What does Dirty really mean to the kernel? What is a brelse?
Which incantantations should I be using on buffers ...
-
I don't know how to handle devices that do plugging. At one time I thought
I had it figured, but now I'm not so sure, so I'm not touching it.
If you know any answers, send them to the address below. This page
has taken a thousand hits in three months and hardly anybody has
written.
Testing
My development used the loop device, my scsi jaz drive, and the NBD
device (note: I'll add links later). It works fine on my development
devices (0.3.6 as I write). I didn't have a spare IDE disk so I'd like
reports. There is a possible problem with IDE as the driver code
plainly holds on to the head request while processing it, unlike
scsi, which takes it off the queue and then processes it. I step
over the first request just in case when writing to device request
queues, but I am worried about races. Scsi and my loop device code
are fine. Even H.J's loop code seems fine, although it too messes
with requests while they're still on the queue, so that's an encouraging
prognostic. Tell me about IDE.
To Do
Add slow-background syncing of a new facet. Is there any advantage
to using checksums? Only if they are held elsewhere. Maybe in a
device header. Get rid of the /dev/yd0 thing altogether or make
it a char device.
CVS
You can browse the cvs file hierarchy by selecting directories. If you
select a file, you will see the revision history for that file.
Selecting a revision number will download that revision of the file.
There is a link in each revision that displays diffs between that revision
and the previous revision, and a form at the bottom of the page that allows
you to display the diffs between arbitrary revisions.
Please send any suggestions, comments, etc. to
Peter
Breuer <ptb@it.uc3m.es>
Quick HOWTO
You should find a .tgz at ftp://oboe.it.uc3m.es/pub/Programs/.
Patch from that with diffs from the CVS tree if you want the latest. Build
with make. Run with make test. Stop the test with make
stop. The test is against two looped-back files of 1MB each mounted
from /tmp, so it should work universally. Read the makefile for more details.
Once running, you should be able to
echo hello >! /dev/loop0
and read it back from /dev/loop1 with
head -10c /dev/loop1.
CVS Source Tree
Current directory: /
Created by cvsweb 1.0
Pages administered by: ptb@it.uc3m.es