Designing interfaces to deal with storage technologies is not only hard, it’s terrifying. This is especially true if you aren’t familiar with the storage technologies involved and have to learn how they work on-the-fly, even if you don’t have easy or any access to work with some of these (typically quite expensive) technologies first-hand.
After we redesigned the storage UI for Anaconda around Fedora 12 or so, I gave a short talk at the Linux Plumbers’ Conference in 2010 to share my storage UX ‘war stories.’ We very happily have an interaction design intern, Stephanie Manuel, who will be working on putting together a usability test plan for the new Anaconda UI, courtesy of the the Outreach Program for Women. Since I need to get Stephanie up to speed on how some of the storage technologies Anaconda deals with work, I decided to provide a summary of that Linux Plumbers’ talk to make it a bit easier to access.
Storage for Desktop Users
So if you’re a general desktop user, there’s a few kinds of storage devices you’re pretty familiar with. Your laptop has a hard drive, perhaps an SSD (solid state disk), you likely have at least a couple of USB pen drives, maybe an external USB drive, and maybe a camera or a phone with an SD card you can mount to your computer. These are all pretty standard and common consumer-level storage devices.
At home, desktop users typically own and maintain their own systems and equipment. (Well, sometimes parents maintain systems for their kids. Or, vice-versa! 🙂 ) If there is any shared storage in the home, it’s likely to be a low-end USB external storage device, or perhaps a consumer-grade NAS mounted to the home network. The storage is likely used to house family photos, music collections, laptop backups, and maybe large video files. More and more these days, these folks also have stored data ‘in the cloud;’ perhaps they use the Amazon MP3 system or Google Music to hold their music collection, and it’s quite likely they use a web application to store their email.
Desktop users at work do not typically own their own system; their system is likely maintained/administered centrally and they might not have permissions to change or configure much about their systems. Their files may be stored locally, perhaps shared over samba/Windows file sharing. The office may also have centrally-managed network storage that the user doesn’t know much about the details of – they may just know it as a ‘network drive’ that is mounted to their system that they can access to read and perhaps store files. Again, increasingly, desktop users in this environment may be accessing data from ‘cloud’ storage, typically from business-centric applications such as Salesforce.com or perhaps from internal web applications such as a content management system or other intranet application. The details of the disks that hold this data ‘in the cloud’ or ‘on the network’ are pretty much obscured to these users.
Storage for Enterprise Users
By enterprise users here, I mean folks who work for large companies that have in-house high-end storage equipment and are working at some technical capacity, whether they are in a system administration, development, or operations type of role. These kind of complex setups are necessary for large-scale and often pretty cool usages of storage technology. For example, think about the web app Pandora – how much disk space do you think they need to hold all of the songs they have at the ready for you to queue up instantly? If you’re in Maine and their main data server is in Southern California, how do they give you as good a performance and speed in queueing up another song to play as someone who is on the West Coast of the US? (I’m using American examples, because I’m not 100% sure Pandora is available outside of the US. Stupid music licensing silliness!)
Sometimes less-sophisticated users encounter the enterprise storage world, though, usually (I think) in confusing and terrifying ways. I think their most positive interactions with it are really as the happy end users of applications that use advanced storage technologies in the background without their awareness of it.
Climbing the abstraction layer ladder to the moon
One of the complications of these high-end storage devices is that they involve layer upon layer of abstraction – both in hardware and software – that I think make it difficult to identify what it is you’re actually working with and how to access what you need to access. For example, the concept of a LUN. LUN stands for “Logical Unit Number.” A logical unit is basically a chunk of storage space, but it’s logical – it’s not physical, it’s not “real,” if that makes sense. A 1 GB logical unit may map directly, 1-to-1, with a single physical 1 GB disk drive, or it may map to a set of five 200MB disk drives, or it may map to varying chunks of space across multiple physical disk drives – let’s say 100 MB here on drive #1, 200 MB there on drive #2, so on and so forth until you’ve cobbled together 1 GB. To the end user, however, no matter what the 1 GB logical unit is made up of, it’s just a 1 GB disk. It appears to be a single contiguous unit of storage to them even though it’s really not a single physical disk sitting in a machine somewhere in a server room, but an amalgamation of bits and pieces of many physical disks.
Making a beefy storage patty
If this doesn’t make much sense, think about McDonald’s hamburgers, okay? We all figure, when we sit down to eat our McDonald’s quarter-pounder hamburger, that the meat in the burger we’re eating likely came from one cow. The meat industry doesn’t exactly work this way, though. Your one hamburger may have ground up bits and pieces of a dozen or even hundreds of cows from all over the world mixed in there. So, if you get 1 GB of networked storage, don’t always assume it’s a hamburger from a single cow. 🙂
There are various technologies that are used at various levels to construct the 1 GB chunk of storage from many smaller chunks; LVM is one such technology, RAID is another. The BTRFS file system is yet another. (Yep, file systems themselves can give you features pure storage technologies provide.) These technologies don’t just enable you to stitch together disparate pieces of storage to make larger, logically atomic units of storage. They also give you features such as redundancy, so if one of the many cows, er, physical disks that makes up the storage piece loses power or otherwise goes down, there’s another physical disk with a backup copy of the same data so your data isn’t lost and you can continue to access it. So this is one type of technology that comes into play in enterprise storage scenarios. Home consumer devices don’t typically give you these higher-level features: if your USB key or drive fails, tough luck buddy!
Protocol / Interconnect / Paths
The data/network paths through which you move your data to/from a chunk of storage are another consideration. Oftentimes storage is classified via the type of connection you access it from or the communication protocol you use to talk to it. For example, fibre channel is a protocol (and a type of cable) used for some storage devices; other storage devices might use SCSI, a different protocol. Confusingly, according to Wikipedia at least, you can speak SCSI protocol over a Fibre channel network. There are other ways to transport the data to and from devices, including FCoE (Fibre Channel over Ethernet) – which I think is essentially using the fibre channel protocol over an ethernet line – and AoE (ATA over Ethernet) which uses the ATA protocol over an ethernet line.
Sometimes, multiple paths are available between a storage device and the computer system(s) accessing it, in which case you need to manage which path you’re accessing the storage over and why. Having multiple paths gives you another form of redundancy – if a line gets cut, there’s still another way to get at the data, or if there is a lot of traffic going over the wire between the storage device and the computer system you could load balance it amongst the multiple paths.
Human Complexity in Dealing with Storage
In dealing with storage interfaces, there’s a level of human complexity you have to consider, too. For example, the team managing the storage systems may not be the same as or even co-located with the teams that need to use the storage, so communication about where the storage in question is located, how to identify it, and how to access it may become complicated. Depending on how the storage is administered, it may range from being open for anybody to access particular portions to being very tightly-controlled at a fine-grained level. This is sometimes not due to any limitation of the technology itself, but the policies of the organization administering the technology. This can come into play for interfaces that involve the usage of storage, though. If you assume that a user could not possibly get write access to a piece of storage that houses critical data because it doesn’t belong to them, you may be assuming too much and might make it possible for random users to arbitrarily destroy data they don’t own in an open system.
Enabling Comfortable Usage of Technology Regardless of Technical Requirements
For anaconda’s UI design in particular, this was a significant challenge: we wanted to enable users who don’t understand fine-grained or even mid-to-high level details about storage technologies to be able to successfully and painlessly install the operating system. At the same time, we didn’t want to disable functionality that technical folks setting up the next cool large-scale web application for file sharing need. This needs to be done in a way that you don’t present options that don’t make sense to the less technically-oriented users.
Musical Storage Chairs
This doesn’t always happen or even happen often, but it’s possible in certain cases that when you reboot a system, your disks will be assigned different names by the system. This is a small piece of a larger issue of the complexity of actually identifying and addressing chunks of storage. It’s hard enough to remember a piece of storage by the goofy, computer-centric name it may be assigned – it’s even more challenging when a system thoughtfully decides to play musical chairs with those names.
A storage slice, by any other name
It’s hard to name storage. At a certain point, if you have a one-to-one mapping between physical disk and logical storage, I suppose you could name your chunk of storage after the make and model of the disk. For example, if I stick one of my USB keys into my laptop, it identifies itself as ‘SanDisk Cruzer 4GB.’ This doesn’t scale very well, though. It is probably pretty common that when a storage device manufacturer puts physical drives into a storage device that they are the same make, model, and capacity. In that case, your storage naming scheme would resemble that of George Foreman’s family, with 6 family members sharing the name “George.” Also, when you’re primarily using storage at an abstracted and logical level in a network, the real-world physical labels on individual pieces of equipment quickly become useless.
While there are several ways that are used to refer to chunks of storage, a common one with advanced storage technologies is the WWN (World Wide Name) / WWID (World Wide IDentifier). These identification numbers may be used to refer to the chunks of storage or devices along the path which you must access to get to the storage (for example, a particular network switch may have an ID you’ll need to use.) The numbers themselves are quite long, but there is a reason for this – it’s to help ensure uniqueness so that when you call up a particular storage slice, you know you’re talking to the one you meant to address. (Again, avoiding the George Foreman issue. 🙂 ) Across vendors, the methods of displaying these numbers can vary and don’t seem to be standardized; some vendors use numbers that are longer than others’, some split the number into pairs using colons in-between, some split the numbers into 4-number chunks that are hyphenated together, so on and so forth.
Depending on the technology, different systems might be used to identify chunks of storage. For example, IBM Z-series storage sometimes uses CCW (Channel Command Word) channel numbers as a way to refer to storage.
Have you ever ended up in a conversation with someone, and you weren’t sure of their name? Then later on, you wanted to ask them a question, but you have no idea what their name is to try to find their contact information? This is the same kind of frustration that reaches a whole new level when you need to access to a piece of storage and aren’t quite sure what it’s called or how to refer to it. 🙂
The long, non-human friendly strings used to identify storage are difficult-to-impossible for people to remember and are also a challenge to transcribe into various application interfaces. Can you imagine being a storage admin and having to help troubleshoot a developer’s issues trying to connect to a slice of storage, reading over the phone a 16+ character WWID number?
There’s other ways that people might use to refer to chunks of storage that are more intuitive and friendly for them. A good storage interface could provide clues when representing storage devices and technology that relate to this more human-friendly references. For example, of the 5,000 slices of storage on a specific storage server, you’re likely more interested in the 5 slices you’ve connected to in the past than the 4,995 ones you’ve never connected to.
We don’t want users of storage technology accessing and destroying data unintentionally because of confusion about how to access the storage they’re really trying to get at. In an interface, we also don’t want to present users with lots of noise about storage devices that aren’t suitable for their needs or useful to them.
Conclusion / Disclaimer
I am nowhere near the universe of being any kind of storage expert. This is a rough attempt at a high-level explanation of different storage terminology and how some storage technologies work and why, based on my understanding from researching and designing UI to interact with some of them. I may have gotten some or many details wrong here. If you have a better explanation for some of these concepts or if I’ve gotten specific things wrong that should be corrected, please let me know in the comments.
Note: All of the graphics I drew for this are CC-BY-3.0; please feel free to use them however you’d like if you find them helpful.