Post by Robert Millan Post by Javier Martín Post by Robert Millan
A --ignore-incompatible flag doesn't sound like a nice thing to do; it means
we're passing our own problem to the user instead of solving it.
We don't have any "problem" to pass to users: ext4 is not supported
We don't have an urge to support ext4, but that doesn't mean we shouldn't
consider it a problem.
I think adding an interface for the user to choose in which way to deal with
our limitations is a nasty thing. I strongly object to that.
May I ask why? Is it thus better to impose our own way of dealing with
our limitations, unchangeable and possibly wrong in some instances (see
below for a possible scenario)? Sensible defaults are good, but choice
is one of the bases of freedom.
Post by Robert Millan Post by Javier Martín
However, given Pavel's and
others' objections, I suggested the addition of an user override to it.
Thus, the user will have to knowingly force the system to interpret the
filesystem with its current code, and accept any failures he might get,
instead of the current behaviour of having the FS mounted automatically
without checking incompatibilities (and then getting the errors anyway).
I don't think this is necessary. First, let's take for granted that our code
is in every situation smart enough not to crash when a filesystem isn't
readable (this should always be the case, since we might occasionaly be asked
to read corrupt filesystems). Then, what do flags mean?
If a flag means "GRUB won't be able to access this filesystem at all", we could
explicitly refuse to probe it, but then again our code must be graceful enough
to cope with it without crashing anyway (see above), so maybe it's not worth to
(depends on the time/size trade-off).
If a flag means "access to the filesystem isn't deterministic, and grub-probe
might be able to do things that real GRUB won't", then we're in a situation in
which we'd like grub-probe to be conservative _but_ real GRUB to be
best-effort. I think this means an internal switch to tell fs probes whether
to be conservative or not. We could even use #ifdef GRUB_UTIL so the flag
checking stuff doesn't make real GRUB fatter.
The problem with INCOMPAT_* flags is that we cannot always know what
they mean, and thus, a "best-effort" reader risks not just bumping into
metadata it does not understand (and thus crashing or spitting thousands
of errors if it's not robust enough), but also ignoring it and reading
wrong data to memory. In some circumstances, this can lead to nasty
bugs, and this is the reason I want the "override-incompatible-flags"
loophole in the driver to require explicit user activation.
We can decide what to do with flags we currently know, like ext4 flags
(extents and such), and that's the purpose of the IGNORE_INCOMPAT macro
in the patch; but with future flags we have no clue. Consider a
hypothetical ext5 file system with a new INCOMPAT_BLOCKCOMP feature flag
set; and without any other "unimplemented" flags like extents and such.
This new flag will mean that _some_ blocks of a file are LZMA-compressed
and some are not (maybe to the criterion of the writing driver, like
more than 5% space savings or such). The info on which blocks are
compressed and which are not is saved on a bitmap linked to from an
extended attribute in the inode (I'm making this out right now, so this
might be impossible as described).
If our current driver found this filesystem and tried to read a
multiboot kernel from it, it might read the whole file as it if were
uncompressed, thus putting a "corrupt" image into memory, possibly even
reading past the end of the file in the disk, or less data than the
stated file size. If the first blocks (with the multiboot headers) were
among the uncompressed ones, GRUB would happily boot from it, thus
leaving it to the criterion of the booted "kernel" what to do. The end
result might be just a triple fault when the CPU runs into compressed
code; or may come to the "kernel" doing something nasty to the computer
due to a corrupt HD driver commands structure, for example.
This scenario would be completely "transparent" to the user because GRUB
would mount the FS without even warning the user. Thus, I think that
"best-effort" is not always preferable when we're dealing with unknown
INCOMPAT_* flags. Those flags mean, by default, "you can't read this
filesystem if you don't know how to interpret this". Thus, I think GRUB
should initially reject to mount any such filesystem, but provide an
override for two cases:
1) The user explicitly requests it. Maybe he's trying to boot an older
kernel which was not compressed in a filesystem that just got
accidentally converted, or just check the source of the error.
2) GRUB is trying to boot from such a partition This can be just as
risky as the scenario I depicted before, but in this case we might
really not have another way out, since the user override would be
unworkable here as there is no prompt.
Of course, there are INCOMPAT_* flags that we know are not really such
(like needs_recovery). We can add those flags to the ignore list and
just forget about them, or implement some handling of them, like
replaying the journal in memory; but we cannot transparently ignore all
INCOMPAT_* flags because we don't know the future!
Thus, I agree with you that grub-probe should adopt the most
conservative stance possible, but reject the idea that the bootloader
proper should automatically activate "best-effort" reading, since this
can lead to very strange and untraceable bugs in the future (I'm just
imagining the puzzled look on the osdev's face when his perfect
multiboot kernel on the ext5 filesystem triple-faults without apparent
motive). Thus, the default should be conservative reading, keeping
best-effort reading to be enabled through an user override.