RAID Re-do for Anaconda

snake-raid
Remix of Cartoon Rattlesnake by Sirrob01, Spray Paint in Action by Guillaume W., and Tango Drive Hard Disk by Warszawianka on OpenClipArt.
Yes, that is a snake spraying RAID onto a pile of disks.
So I think out of all of the feedback we got about the Anaconda UI redesign, the one piece of the UI that’s received the most negative feedback is the RAID configuration piece of the custom partitioning UI. The designs for how this UI ended up getting implemented in Fedora 18 was posted to this blog in December 2011. I really wish we’d received the level of feedback we received post F18-Beta and post F18-GA at that point, so the design could have been modified before it was implemented! That being said – I’m not placing blame with anybody but myself – I got this design wrong, and for that I am sincerely sorry.

The initial design

The idea behind the initial design was to not have a simple dropdown with a list of RAID numbers, because we found in our research even trained sysadmins don’t tend to remember what every single RAID level means or which one is best for which situations. When you don’t have RAID on the brain, and you’re trying to make a decision (not just looking for a specific RAID number because someone told you to, which is a fair use case), such a dropdown as shown in the bottom left might read like the one on the bottom right:
bleh-raid
So I wanted the design to have some more contextual guidance as to why you’d turn on a particular option or not for a given situation, for folks who have a lot on their minds and don’t have reference materials handy to look up which level they need. This is what that initial design looked like:

This design didn’t work on a number of levels. The first issue is that the most common nomenclature for RAID configuration is the RAID number, and while there is an indicator on the screen that updates depending on which RAID ‘features’ you check on or off, it’s not very visible/apparent and needed some more prominence so it was visible. The other issue with the design is that in trying to keep the text sparse (and not being able to make changes past string freeze), some of the terminology is confusing. For example, many folks missed that the ‘distributed’ and ‘redundant’ checkboxes referred to how parity was handled. Higher up there is a ‘redundancy’ checkbox meant to refer to mirroring – the ‘redundancy’ and ‘redunadant’ checkboxes ended up causing confusion.
The design seemed to fail for two important types of users – junior sysadmins who were told (perhaps by a superior or by a policy document) to set a certain RAID level by number (e.g., RAID 10) and who didn’t totally understand all that meant and just wanted to set ‘RAID 10’, and RAID experts who knew what they wanted but found a lot of cognitive dissonance in the way the checkboxes were arranged and the terminology used for them.

First redesign

Stephanie and I first decided to think about redesigning this by providing a way to select a specific RAID level in addition to the feature-based checkbox system. Stephanie put together some mockups to show what that might look like.
Here’s her mockup showing how it would look for the by-feature part:
RAID-doom_by-feature
And here’s her mockup showing how it would look for the by-RAID level part:
RAID-doom_by-level
We sent these mockups with a bit of a background of what we were trying to do to Doug Ledford, who is Red Hat’s resident RAID expert. He came back to us with extremely useful and detailed feedback as to how the feature-based portion of the UI conflicted with how RAID experts think about RAID, and gave us some suggestions for a better hierarchy and contextual information to organize the features by in the UI. I took a lot of the points he made in his feedback and transcribed them to a working wiki page for the redesign effort here:
https://fedoraproject.org/wiki/Anaconda/UX_Redesign/RAID_Redesign

Second redesign

Last week Dave Lehman and I went through Doug’s feedback and walked through how it might translate to actual UI. Doug suggested the following tree structure for organizing the RAID levels:

  • Redundancy-Based Fault Tolerance
    • Drive Mirroring (RAID1.)
    • Drive Striping with Redundant Copies (RAID10.)
  • Parity-Based Fault Tolerance
    • Single Disk Failure Tolerance (RAID4 or RAID5.)
    • Two Disk Failure Tolerance (RAID6.)
  • Non-Fault Tolerant Array (RAID0, or striping.)

He also suggested that we should include options for the following essential things:

  • Drive selection – we actually already did this; you have to press the ‘settings’ icon when you have the RAID device selected on the left-hand-side of the screen to select disks. Since selecting disks is important to RAID configuration though, we decided to provide a place to click on right in the RAID UI as well that would bring up that same disk-selection screen.
  • Spare allocation – this had been possible in previous versions of Anaconda and had been left out of the new design, so we agreed it needed to be added back and we added it back in the mockups.
  • Array name – this actually was already possible to set; there is a ‘name’ field on the right-hand side of the screen where you work on the RAID device. Our assumption was that the mockups we sent Doug didn’t have this field (Dave had added it and we forgot to update the mockups before sending them to Doug.) So no change on this one.

So I came up with a set of new mockups based on Doug’s feedback. I’ll walk you through each:

Redunancy-Based Fault Tolerance

01-RAID1
Let’s walk through this mockup:

  • So the main pivot point in this UI is a drop down where you pick the type of fault-tolerance (if any) you want in your RAID array.
  • The mockup above shows what the UI looks like if you pick redunancy-based fault-tolerance. There are two RAID configurations under this category – RAID 1, which is simple drive mirroring; and RAID 10, which is mirrored striped disks. Each configuration has a name to indicate what it actually is with the RAID level in parenthesis following the name. It also has short (as short as I could make it, but we’ll probably try to tighten it up further for our German-speaking friends :)) descriptions of the pros and cons to the individual RAID levels, and a note showing how many disks are required for the setup.
  • You’ll notice in this first mockup, we have 2 disks selected for the array, so we can’t select RAID 10 – it’s greyed out because it requires a minimum of 4 disks. The 4 disk requirement is also marked in red so it grabs your attention – that’s why the selection is greyed out.
  • The notification that you have two disks selected for the array is a link, and if you click on it you’ll get a summary of the disks that are part of your array with the option to add or remove other disks.
  • In the lower right corner has an area to set a particular number of spares. Since this mockup shows a scenario when you have 2 disks only, this area is greyed out because you don’t have any spare disks to configure spares.

Parity-Based Fault Tolerance

02-RAID5
Let’s talk a bit about this next mockup:

  • Here in that dropdown you can see ‘Parity-Based’ fault tolerance is selected. There are three RAID levels that fall under this category – RAID 4, RAID 5, and RAID 6.
  • In this scenario, you can see from the link in the bottom left corner that we have four disks selected for the array. Since the 3 RAID level options we have all require 3 disks, the spare configuration widget is now lit up: we can configure up to 1 spare using it.

No Fault Tolerance

03-RAID0
One more RAID configuration UI mockup here:

  • This mockup shows what the screen looks like if you decide to not have any fault tolerance. This gives you basically one option: RAID 0. One thing Dave and I talked about with this one is whether or not it should have a radio button. Dave felt it was more clear you were making a selection with the dropdown to show the radio button there, as vestigal as it may be. I’m a bit on the fence – my initial mockup of this screen didn’t have the radio button. I think we’ll try it in practice and see what kind of feedback we get and go from there.
  • RAID 0 doesn’t support the concept of spares (you’re just striping data, there’s no mirroring) so instead of showing the spare configuration widget here, there’s just a note saying that you can’t do that. 🙂 The note is where the spare widget normally is so hopefully that’s where the user would look if they, for some reason, found the need to set spares on RAID 0 and didn’t know you can’t do that.

Disk Selection Screen

04-DiskSelection
This is the same screen you get today in anaconda if you click on the ‘settings’ icon in the left-hand pane of the custom partitioning screen. Just showing it here so you can see what it should look like when you click on the link in the lower left corner of the RAID UI – basically, that same old disk selection screen.

Feedback please?

What do you think of this latest redesign of the RAID screens? The one negative side I will put out there on it is that it is an awful lot of text. We do have a lot of room on the anaconda screens, though (although we’ve been talking about potentially capping the screen resolution since travelling miles from one side to the other on nice graphic hardware isn’t fun.) I’m worried about translations for this text too since it’s pretty technical and it may balloon a lot once translated.
Other than that, I don’t see any huge downside to it. What do you think? Is this an interface you think would work well for when you want to configure RAID?

26 Comments

  1. Stephen Smoogen

    I am thinking it is a LOT of text also. RAID is one of those things that if you know what it is, you know the terminology more than the description. If you don’t know it, then you are going to need help.. and that is better for a separate document than short descriptions. Is there a way to get someone to the documentation easily? “Don’t know what this is? Click Help.” or something? Because there are a lot of ways that you can screw up RAID without knowing you did so and Anaconda can’t stop you from doing it. [Say put a slow drive in a RAID-5 array.. and then watch everything go into D.] And sometimes what looks like you are “screwing” up is actually you trying to do something good. [Have an SSD and special harddrive paired in RAID-1/RAID-10 configuration. [But not a normal harddrive as you go into D.]] All of that requires you to know what you are doing or have good detailed help documentation.
    By the way I hope this makes sense and does not come across as what you are trying to do is bad.. just that I agree that to do it right needs a lot of text and you may even need more than you have currently.

    • mairin

      Sweet 🙂
      The technology dropdown, if you pick a RAID partition type, would let you pick LVM or RAID – if you pick LVM there then you could use that to have LVM running on top of RAID. It isn’t really fully-fleshed out right now though.

  2. andrew89

    Mairin,
    As a more casual user of Fedora, I will have to chime in, in favor of “less text”. I also did not find it very easy to follow this post. By all means, have enough features to suit the advanced sysadmin people, but don’t forget that far from everybody even understands what “RAID”/level means. Don’t swamp us with options (and stick with the coziest possible defaults).
    I was already quite impressed with how easy and pretty the process for installing F18 was. There were a couple of chinks here or there, maybe, but I found it much less arcane than the old installer. Moving forward, the installer should continue to be an app that can fit into the GNOME philosophy … even a first-time GNU/Linux user should be able to get it.

      • andrew89

        I hardly know what RAID is in the first place (and until a little while ago, I thought LVM was a successor to ext4 filesystem or something). It’s a technology that has to do with organizing the file system on the hard disk. That’s as well as I understand it. So, when I would install a desktop OS (which is always Fedora these days), instead of being presented with multiple options of that sort, and expecting to know the difference between one thing and another, I far prefer that very sensible defaults have been arranged ahead of time. The hardcore can change those if they want or need to.
        Someone who has only used Windows before would have the same quandary about choosing ext4 vs whatever else. “What is this stuff? What’s a file system?” So a key part of the UI design needs to be about still having advanced options, but not putting them so front-and-center that they confuse people.

        • mairin

          Hi Andrew,
          “So, when I would install a desktop OS (which is always Fedora these days), instead of being presented with multiple options of that sort, and expecting to know the difference between one thing and another, I far prefer that very sensible defaults have been arranged ahead of time. ”
          That is actually the case, and you never would see any of these screens unless you explicitly opted into the more advanced custom partitioning UI. Users who choose to go with the defaults simply select which disk they’d like to install to and everything is automatically configured to install to the free space on the selected disk with all defaults in place.
          I’m guessing you haven’t used the Fedora 18 installer – you might want to take a look if you’re curious.

          • andrew89

            Hi mizmo,
            I have used the F18 installer (quoted from first comment: “I was already quite impressed with how easy and pretty the process for installing F18 was.”). It is fantastic. It definitely takes some of the scary guesswork out of the installation process.
            But since you’ve confirmed that these mockups only apply to the advanced partitioning process that desktop users like me try to avoid, there’s no worries.
            Keep it up.

          • mairin

            Hi Andrew,
            Oh okay sorry about that. I only see the latest comment when I post a reply so I missed that in your first comment. I only said that because I thought if you’d gone through the UI already you’d have the experience of not having to see this screen if you didn’t opt into it, but thinking about it I can totally understand reading the blog post not having seen the screen before being confused about where it was coming from! Sorry about the confusion!

  3. Mairin: awesome job, this looks like a huge improvement. My wordsmithing muse is running rampant again, though:
    “This is the highest-performance RAID option; it increases the risk of data loss, however.”
    That semi-colon is a bit precious (and I say this as the world’s biggest semi-colon fan!) and the trailing ‘, however’ seems clumsy. How about:
    “This is the highest-performance RAID option, but it increases the risk of data loss”.
    Or you could put in a bit more detail, like there is for the other cases:
    “This is the highest-performance RAID option, but it increases the risk of data loss. If any disk fails, the array will be unusable, and you may not be able to recover any or all of the data from it.”

    • mairin

      LOL when I posted this and looked through the mockups I had the same thought about the ‘however.’ I like your shorter variation on it, I’ll update the mockups accordingly I updated the mockup with your suggested text already. 🙂

  4. Nick Coghlan

    Looks great!
    One comment is that for the “junior sysadmin doing what they’re told” case, it may be helpful if the dropdown also included the corresponding raid levels. Appending “(RAID 1, 10)”, “(RAID 4, 5, 6)” and “(RAID 0)” respectively would probably suffice.

  5. Mairin, I think Andrew’s comments support the notion that you designers have a very, very difficult assignment with Anaconda because you need to make one piece of software handle multiple installation scenarios across an unknowable range of hardware and an unknowable range of user expertise and needs.
    That said, I’d argue for trying to segregate RAID and other features that can be fairly considered the realm of knowledgable or expert users. An installer does not need to make the concept of RAID or LVM understandable by someone who doesn’t know what they are.
    I think you have, roughly, three kinds of users: One, someone with little partitioning skill or knowledge who just wants to use or create some free space on a Windows machine to install Fedora. Their main concern is going to be not to screw up the Windows partition or make it invisible to the bootloader.
    At the other extreme is the user who is well versed in LVM, RAID, etc., knows what they are doing, and is using a graphical installer more or less as a convenience. I’d put the routines for them behind a button that says “Expert” and I would try to stay out of their way.
    In the middle are those with intermediate knowledge and experience. I think the current “LVM/Standard/BTFRS” dropdown, the “Let me do it my way..: and “Reclaim Space” area are adequate for them. However, reclaiming space is not what I think of when I think of repartitioning an existing filesystem. Could that button’s label change to something like “Repartition” if the user selects the “I Wanna Do It my Way…” option.
    I’d caution against trying to build an installer that makes complex partitioning activity easily do-able by inexperienced and inexpert users. E.g., if someone wants to create multiple RAID partitions across multiple drives, I think you have to assume they know what they are doing. An installer can’t guide a clueless person through that kind of effort. Design can go only so far to make up for lack of skill.

    • mairin

      “I’d argue for trying to segregate RAID and other features that can be fairly considered the realm of knowledgable or expert users.”
      I think maybe I need to make it clear in this blog post that this UI is part of the opt-in custom partitioning UI that does exactly this – segregates knowledgeable/expert users away from less-knowledegable users. Andrew would never see this screen because he would not opt into those detailed advanced controls.
      “In the middle are those with intermediate knowledge and experience. I think the current “LVM/Standard/BTFRS” dropdown, the “Let me do it my way..: and “Reclaim Space” area are adequate for them.”
      Unfortunately, those very same users are really kicking and screaming to be able to spin more knobs and poke at more features. I know because I have discussions with them in bugzilla frequently. :-/ While these RAID explanations will hopefully benefit them, they are also aimed at the more expert users who don’t necessarily configure a RAID array every day (does anybody?) who might need a little reminder. I think I mentioned in the blog post – we talked to several storage administrators when researching this feature and they definitely tended to treat some of the RAID levels as ‘trivia’ – some they obviously knew and could explain by rote, but the ones they didn’t use often themselves they had to refer to reference material to be able to explain exactly what they did.
      “I’d caution against trying to build an installer that makes complex partitioning activity easily do-able by inexperienced and inexpert users.”
      We’re definitely not trying to do that here – I’m curious as to what made you feel as if we were trying to do that?

    • mairin

      The intent is/was to let you choose which system to RAID your disks with (eg if you picked btrfs you could pick between btrfs’ built-in functionality or layer it on top of mdraid, similarly for LVM you could pick LVM’s native stuff or use mdraid under it.) Since a lot of work is going into expanding the functionality for both LVM and BTRFS’ built-in solutions, I’m not sure moving forward what makes the most sense but that’s definitely a conversation we should have.

  6. Pingback: Jak bude vypadat nastavení RAID v Anacondě? | Fedora.cz

  7. Tomasz

    Looks nice! I would drop RAID4 entirely – is anybody using it?
    What I’d really want to see is an ability to select additional device for metadata; mdraid supports it: https://raid.wiki.kernel.org/index.php/Write-intent_bitmap ; btrfs will support it in future (as SSD caching layer).
    Last thing: raid level selection is shared between mdraid and btrfs, I suppose? It would be good to prepare for the moment when single btrfs pool supports different RAID level for each subvolume; it is coming, too.

  8. blambert555

    First of all I think what you are proposing is definitely an improvement. I think that organizing the options by description rather than by the RAID level, however, is a huge step backwards. For the middle group, it would be good to try to at least educate them a bit on what the RAID levels are called. For the advanced group, the cognitive dissonance is really huge. Not only do I have to remember what the description of the RAID level is, I have to match it to what terminology you are using to describe it. I really question whether the payoff of reorganizing the menus by description is worth the pain it would cause them. Why not simply have a drop down at the top with the RAID level tags, then have a text box below them that lists all the details of the option chosen. This would change when you select a different RAID level. That way the curious middle group could have active descriptions, and advanced users could quickly skip them, or read them only if they choose to do so.
    Also I would discourage having a greyed out spare box. For intermediate users who may not understand that they have not met the minimum drive requirement, it only frustrates them to be offered something they can not have. Even with the notification close by, I would still discourage it. Why not have a notice saying that not enough drives are available to configure a spare, and have it change into a control when the requirement is met.
    One other suggestion is to be very careful in your description of the RAID levels. For example, if you have a SSD and a Spinning HD, choosing RAID 0 will result in less performance than using the SSD only. I think that the bottom line is that you really can’t simplify the process of setting up a RAID too much without falling into the trap of misinforming users as the number of edge cases rise. This will only become more complicated as new types of storage technology are introduced. For example, what would you suggest if the user has a Hybrid drive, an SSD, or a SAS drive, or all three in the same system?

  9. Chip Turner

    What is the goal of the design? What kind of user? I fully realize I may not be its targeted audience, but it seems to go overboard with text and guidance.
    And why is the installer trying to teach people what RAID is? RAID is a power-user setting. People who don’t know what RAID is shouldn’t be setting up RAID. RAID and LVM are complex spaces; to me, the installer should default to something simple and direct, then let me express my complex setup, not help me decide then-and-there what the complex setup should be. Nor should it somehow be a tool to educate a junior sysadmin; that’s what their coworkers and bosses are there for.

  10. Robin Norwood

    I agree with Chip (Don’t tell him, though). I think the latest design is getting to the point where it’s trying to do too much teaching.
    My basic mental model for RAID is that it’s a process of taking a number of disks/block devices and turning them into one. *then* you might add other sauce like a file system, or even LVM, etc. So at least according to my mental model, the RAID part of the UI is coming too late…the screens above imply that I’ve already decided on a partition structure. What about something like this.
    Starting from the very first screen after you go into the disk configuration section:
    Screen 1: Disks

    Select the disks to you'd like to install Febuntu on:
    [x] 1TB WD10000k
    [x] 1TB WD10000k
    [x] 1TB WD10000k
    [x] 64gb Samsung SSD {Note: Contains existing ext/linux filesystem}
    [Continue and select partitions] [Configure RAID]

    Anything anaconda thinks is a “normal” disk (not a CD drive or a USB stick, for instance) are checked by default. The “Configure RAID” option is available if there’s more than one disk. Maybe add a “What is RAID?” help button.
    (Assuming all four disks are left checked, and the Configure RAID button is clicked)
    Screen 2: Configure RAID

    Select disks for first RAID array:
    [x] 1TB WD10000k
    [x] 1TB WD10000k
    [x] 1TB WD10000k
    [ ] 64gb Samsung SSD {Note: Contains existing ext/linux filesystem}
    Select RAID level:
    ( ) RAID 0: Striped, no redundancy (3TB)
    ( ) RAID 1: Mirrored pair (2TB)
    [...]
    (x) RAID 5: Parity, single-disk fault tolerance (2TB)
    [...]
    [Continue] [Add another RAID array] [Go Back]

    So a little bit of magic here. First, while you *can* use different sized disks, you usually only configure RAID with identical disks. So the default selection is all “identical” disks available, and whatever RAID level is most “common” for that number of disks – probably RAID 1 for 2 disks, RAID 5 for >2. Then simply list the RAID levels in numerical order, followed by the shortest possible description you can come up with, and finally the space the user will end up with after that. If the user selects RAID 1 from here, unselect one of the disks, and pop a little note that says “RAID 1 requires an even number of disks (or is it just 2? I can’t remember.) Same sort of notification if the RAID level selection requires a change in the # of selected disks above – ie, two disks of three identical selected, if you click on “RAID 5”, add the third and tell the user why.
    More detailed help/descriptions of RAID can be hidden behind a “Help” button. Or just link to wikipedia. 😉
    (Assuming RAID 5)
    Screen 3: Configure Partitions

    Partitions:
    HOME
    /home 2.9 TB on raid5array
    ROOT
    / 64 GB on raid5array
    SWAP
    16GB on 64gb Samsung SSD
    BOOT
    48GB on 64GB Samsung SSD
    FREE SPACE
    0GB

    Selecting any of the partitions gives you a UI pretty similar to what you have above the RAID box in your mockups, except maybe instead of “Desired capacity”, I would just show the current capacity, and have grow/shrink buttons near that.
    What do you think? Does this make sense?

  11. Eric

    Great job on the analysis & redesign Máirín! I just had one thought – in my mind, the word ‘suffer’ that’s used repeatedly in the RAID level descriptions reads very strangely to me… What about something like ‘survive’, ‘withstand’, ‘handle’, etc? To me, ‘suffer’ has a negative connotation, which I think is the opposite of the intent. Maybe it’s just that my command of English is poor, I am an engineer after all 🙂

Leave a Reply to Juan Rodriguez Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.