Veritas Vision, Day two!
Apr. 26th, 2005 06:48 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
At about 3:30 today I tumbled numbly out of a Veritas File System session thinking "Mister Berkowitz, may I please be excused? My brain's full." Much learning was had, at a cost.
I missed something when I was writing yesterday: The afternoon keynote. Jonathan Schwartz, COO and general faceman for Sun Microsystems, jumped up on stage and gave a well-spoken pitch about utility computing. He talked a good line, made a lot of strong allusions to Sun as being in serious competition only with Microsoft and Redhat, and generally made utility computing seem plausible, sort of. The peak of his pitch was the Sun Grid - basically a big distributed supercomputer that Sun's running and they're vending out the CPU time and disk space for a buck a CPU-hour (cheap, when you think about it) and a buck a GB/month. Only intended for big batch-type jobs, but an interesting concept. The old guy from yesterday was pretty harshly mocking of the entire concept. I take a more balanced view: I can't use it now, but I can imagine a use for it later, in some ways.
So this morning, dragged myself out of bed (since I couldn't sleep last night), and went in to hear a pitch about Veritas Volume Replicator. It was interesting to hear the pros and cons and whys and whynots. Cool piece of tech for the day: Peribit technologies network accelerators - lessen bandwidth usage by things like VVR so it doesn't cost a billion dollars to run.
There was a question that I keyed to real quick in the midst of it all. A guy mentioned that he had a database that grew by _maybe_ 300 megabytes a week, and in that time there was something like 400 gigabytes of traffic solely from VVR. VVR does filesystem based replication at the block level. Yesterday, the Oracle guy mentioned that any active, read-write database file from oracle gets constant updates to its atime and mtime as Oracle pores over it. So, the solution is to mount the volumes noatime - which I believe is an option, and organize your DBFs so that you can read-only a bunch of them. Also, ffs, don't replicate redologs and such.
The big question that sprang up in my mind is this: How the hell does Oracle understand VVR results? The failover's going to be pretty weird, I think. It's certainly _not_ something that can be done bidirectionally - very much has to be a DR thing.
I suppose I should take a moment to mention where I'm doing all these little incremental writeups. Scattered in various places around the hall are these shaped-beanbag recliner thingies in a big herd. They're sorta curious, but very functional.
Anyway. Morning keynote. CEO of Network Appliance, one of Veritas' hardware/software solution partners. Topics of note included Tiered Storage and Disk-based backup. Specifically, what NetApp is doing is putting out unified hardware solutions for tiered storage - i.e. one big SAN solution which does Fibre Channel, Serial Attached SCSI, and SATA - a series of tiers of both disk technology and disk communication tech - the idea of which being that you should be able to manage all your disk in one big beast of a device regardless of the technologies used to connect to it.
Disk-based backup is a big thing because tape is damn expensive, and if you have a fucking huge database, say, 10TB with 1TB of change a week, you want to be able to back it up for as cheap as possible. Disk doesn't fail as often and is online, random-access storage, so you can back things up more efficiently by doing a diff between current snapshot data and what's on the disk, so instead of having a stack of level 0 backups to make sure your backup is reliable, you have about a tenth as much disk space, mirrored, or, as NetApp pushes, running dual-parity RAID (same as RAID 5, but with two parity bits instead of one, so you lose two disks worth per volume in parity instead of one. No big deal when you're going to have 30 disks in a volume.)
He started gibbering in business-ese at the end, so I wandered off to do this writeup.
At the end of this convention, I'm going to have a list of bad habits I noted in speakers, since I'm listening to a bunch of presentations. This guy had several, which I found interesting, since Jon Schwartz was pretty much flawless, and so was Gary Bloom (CEO of Veritas). The list of bad habits is going to be a short snark that I'm saving up since I can't walk up to these folks and hit them in the head with a trout. Bad speaking habits make my teeth ache.
Speaking of bad speaking habits. Session two for the day was on Veritas Dynamic Multipathing. Technically speaking, this is a layer between the IO subsystem and the disk drivers that handles situations where you have more than one connection to a disk, so you can load balance across the connections and handle connection failures without interruption of service. Very common in SANs and high-availability situations.
The engineer presenter was relatively sharp, knew what he was talking about, though he was nervous. He had, however, the _most_ unusual speech-twitch I've run into at the conference. Imagine, if you will, the heavy, formal sounding Indian accented english. (I've figured out that I have no problem understanding an indian accent in person, strangely.) Now, if you will, imagine someone with that accent with the Valleyspeak twitch "like, y'know". My head quietly exploded when I realized what he was saying. Funny as hell.
Anyway. DMP. Here's the thing about it and in general about I/O optimizing. Every layer of I/O has its own parameters intended to optimize handling IO. The key element in all of these things is the size of IO operations.
Another step back. When your program asks the computer to read from disk, it's asking a library function. The library function has access to a cache of sorts in memory of information from the disk. If it finds things in cache, easy-peasy, hands it back. If it has to go to disk, it takes more work. There's the basic.
Disks store information in blocks - generally, in the unix world, blocks are 512 bytes. (Bytes are pretty much synonymous to characters for most purposes, so think of 512 characters from a text file.) File systems are sets of rules for how blocks get read and written to disk. Generally, filesystems allow you to set sizes of clusters of blocks that are allocated on disk as a group. Since it has to keep track of clusters individually, which means space wasted just tracking the clusters, generally, the larger the disk, the larger the cluster. On a large disk, let's say 8 kilobyte clusters - 16 of those blocks.
Disk drivers can be set to read a particular number of blocks per time they have to access the disk; this facilitates caching so if you _know_ that you're probably going to be reading more info soon, you have it in memory already. Sorta like knowing you're going to read all the volumes of a comic book you have, so you pull them all off the shelf and put them on your coffee table in front of you, rather than only getting one at a time.
Okay, so we've got a bunch of layers: Disks, disk drivers, Dynamic multipathing load-balancing stuff (how big a set of information to put over each path before switching over), Filesystems, and applications. Each of these has it's own size of reads and writes. Some layers aren't configurable, some are. We have to match things up as best we can. DMP is a layer I hadn't previously been considering in this mess. The ideal situation is that each layer as we get closer to the end user is using either the same size block for reads, or a size a multiple smaller, so you're not bottlenecking anywhere or introducing inefficiencies.
Arright, next session. Lunch over, I went to a hands-on session for a product called CommandCentral. It was sort of a bastard child product, a bunch of old tools rolled together into something coherent. The tool is intended for detailed reporting on storage resource usage. Where a lot of other tools base their information on CPU usage or other silly statistics, this one centers on helping you manage your SAN and local storage. It has a shitload of statistics that it keeps ( unfortunately in a Sybase database. WTF, folks. I'd rather not have _another_ database tool installed), and not only does reporting and alerting, but allows you to manage access, so you can add and remove access to disk from the web-based interface. Not quite sure how robust it is.
Eveningtime, went to House of Nanking instead of giving a damn about a Netbackup keynote.
Got back to con and did a quick checkup on some questions I had about the CommandCentral feature set - basically, was curious to see how much control it had over SAN management, and the answer is... not much. LUN-mapping, and that's about it.... though that's quite a bit.
Now I'm going to go back to the hotel and relax (and finish digesting this food, damn). Day two, conquered. I declare more victory. And Chinese food.
I missed something when I was writing yesterday: The afternoon keynote. Jonathan Schwartz, COO and general faceman for Sun Microsystems, jumped up on stage and gave a well-spoken pitch about utility computing. He talked a good line, made a lot of strong allusions to Sun as being in serious competition only with Microsoft and Redhat, and generally made utility computing seem plausible, sort of. The peak of his pitch was the Sun Grid - basically a big distributed supercomputer that Sun's running and they're vending out the CPU time and disk space for a buck a CPU-hour (cheap, when you think about it) and a buck a GB/month. Only intended for big batch-type jobs, but an interesting concept. The old guy from yesterday was pretty harshly mocking of the entire concept. I take a more balanced view: I can't use it now, but I can imagine a use for it later, in some ways.
So this morning, dragged myself out of bed (since I couldn't sleep last night), and went in to hear a pitch about Veritas Volume Replicator. It was interesting to hear the pros and cons and whys and whynots. Cool piece of tech for the day: Peribit technologies network accelerators - lessen bandwidth usage by things like VVR so it doesn't cost a billion dollars to run.
There was a question that I keyed to real quick in the midst of it all. A guy mentioned that he had a database that grew by _maybe_ 300 megabytes a week, and in that time there was something like 400 gigabytes of traffic solely from VVR. VVR does filesystem based replication at the block level. Yesterday, the Oracle guy mentioned that any active, read-write database file from oracle gets constant updates to its atime and mtime as Oracle pores over it. So, the solution is to mount the volumes noatime - which I believe is an option, and organize your DBFs so that you can read-only a bunch of them. Also, ffs, don't replicate redologs and such.
The big question that sprang up in my mind is this: How the hell does Oracle understand VVR results? The failover's going to be pretty weird, I think. It's certainly _not_ something that can be done bidirectionally - very much has to be a DR thing.
I suppose I should take a moment to mention where I'm doing all these little incremental writeups. Scattered in various places around the hall are these shaped-beanbag recliner thingies in a big herd. They're sorta curious, but very functional.
Anyway. Morning keynote. CEO of Network Appliance, one of Veritas' hardware/software solution partners. Topics of note included Tiered Storage and Disk-based backup. Specifically, what NetApp is doing is putting out unified hardware solutions for tiered storage - i.e. one big SAN solution which does Fibre Channel, Serial Attached SCSI, and SATA - a series of tiers of both disk technology and disk communication tech - the idea of which being that you should be able to manage all your disk in one big beast of a device regardless of the technologies used to connect to it.
Disk-based backup is a big thing because tape is damn expensive, and if you have a fucking huge database, say, 10TB with 1TB of change a week, you want to be able to back it up for as cheap as possible. Disk doesn't fail as often and is online, random-access storage, so you can back things up more efficiently by doing a diff between current snapshot data and what's on the disk, so instead of having a stack of level 0 backups to make sure your backup is reliable, you have about a tenth as much disk space, mirrored, or, as NetApp pushes, running dual-parity RAID (same as RAID 5, but with two parity bits instead of one, so you lose two disks worth per volume in parity instead of one. No big deal when you're going to have 30 disks in a volume.)
He started gibbering in business-ese at the end, so I wandered off to do this writeup.
At the end of this convention, I'm going to have a list of bad habits I noted in speakers, since I'm listening to a bunch of presentations. This guy had several, which I found interesting, since Jon Schwartz was pretty much flawless, and so was Gary Bloom (CEO of Veritas). The list of bad habits is going to be a short snark that I'm saving up since I can't walk up to these folks and hit them in the head with a trout. Bad speaking habits make my teeth ache.
Speaking of bad speaking habits. Session two for the day was on Veritas Dynamic Multipathing. Technically speaking, this is a layer between the IO subsystem and the disk drivers that handles situations where you have more than one connection to a disk, so you can load balance across the connections and handle connection failures without interruption of service. Very common in SANs and high-availability situations.
The engineer presenter was relatively sharp, knew what he was talking about, though he was nervous. He had, however, the _most_ unusual speech-twitch I've run into at the conference. Imagine, if you will, the heavy, formal sounding Indian accented english. (I've figured out that I have no problem understanding an indian accent in person, strangely.) Now, if you will, imagine someone with that accent with the Valleyspeak twitch "like, y'know". My head quietly exploded when I realized what he was saying. Funny as hell.
Anyway. DMP. Here's the thing about it and in general about I/O optimizing. Every layer of I/O has its own parameters intended to optimize handling IO. The key element in all of these things is the size of IO operations.
Another step back. When your program asks the computer to read from disk, it's asking a library function. The library function has access to a cache of sorts in memory of information from the disk. If it finds things in cache, easy-peasy, hands it back. If it has to go to disk, it takes more work. There's the basic.
Disks store information in blocks - generally, in the unix world, blocks are 512 bytes. (Bytes are pretty much synonymous to characters for most purposes, so think of 512 characters from a text file.) File systems are sets of rules for how blocks get read and written to disk. Generally, filesystems allow you to set sizes of clusters of blocks that are allocated on disk as a group. Since it has to keep track of clusters individually, which means space wasted just tracking the clusters, generally, the larger the disk, the larger the cluster. On a large disk, let's say 8 kilobyte clusters - 16 of those blocks.
Disk drivers can be set to read a particular number of blocks per time they have to access the disk; this facilitates caching so if you _know_ that you're probably going to be reading more info soon, you have it in memory already. Sorta like knowing you're going to read all the volumes of a comic book you have, so you pull them all off the shelf and put them on your coffee table in front of you, rather than only getting one at a time.
Okay, so we've got a bunch of layers: Disks, disk drivers, Dynamic multipathing load-balancing stuff (how big a set of information to put over each path before switching over), Filesystems, and applications. Each of these has it's own size of reads and writes. Some layers aren't configurable, some are. We have to match things up as best we can. DMP is a layer I hadn't previously been considering in this mess. The ideal situation is that each layer as we get closer to the end user is using either the same size block for reads, or a size a multiple smaller, so you're not bottlenecking anywhere or introducing inefficiencies.
Arright, next session. Lunch over, I went to a hands-on session for a product called CommandCentral. It was sort of a bastard child product, a bunch of old tools rolled together into something coherent. The tool is intended for detailed reporting on storage resource usage. Where a lot of other tools base their information on CPU usage or other silly statistics, this one centers on helping you manage your SAN and local storage. It has a shitload of statistics that it keeps ( unfortunately in a Sybase database. WTF, folks. I'd rather not have _another_ database tool installed), and not only does reporting and alerting, but allows you to manage access, so you can add and remove access to disk from the web-based interface. Not quite sure how robust it is.
Eveningtime, went to House of Nanking instead of giving a damn about a Netbackup keynote.
Got back to con and did a quick checkup on some questions I had about the CommandCentral feature set - basically, was curious to see how much control it had over SAN management, and the answer is... not much. LUN-mapping, and that's about it.... though that's quite a bit.
Now I'm going to go back to the hotel and relax (and finish digesting this food, damn). Day two, conquered. I declare more victory. And Chinese food.
no subject
Date: 2005-04-27 05:33 am (UTC)Turns out my company bought Peribit today.
no subject
Date: 2005-04-27 05:57 am (UTC)And happy to report. I'm trying to keep track of the _useful_ tidbits I get. I'll probably toss off something a little more concrete about CommandCentral when I get it down in my mind. Seems like an awesome tool for medium or larger SAN and backup environments, since it's designed to make coherent and useful business-oriented reports - the stuff, hypothetically, one's boss wants anyway.
no subject
Date: 2005-04-27 04:22 pm (UTC)I'd love to hear more about CommandCentral, especially the parts about backup. I manage a decent-sized NetBackup environment, which has its own unique set of joys and frustrations -- mostly frustrations. :-)
What I recall of CC: Backup
Date: 2005-04-27 04:43 pm (UTC)What we primarily covered on CC Backup was the reporting views. You can specify based on Netbackup server, host, date, whatever, and generate a visual report of what backups succeeded and failed, and the error codes. It ties onto a local knowledge base to get what the codes mean, and, if you like, what your local SOP is for resolving the problem.
From a business standpoint, you can assign dollar values to your backups and assign various backup volumes to different departments, so you can produce a report saying 'X department is costing us such-and-so for backups'.
You can do backup trending so you can see growth rates.
CC:backup keeps it's own database, so what it polls from Netbackup can be safely purged from Netbackup's logs and kept in CC for historical purposes.
You can, like the rest of CC, set up alerts so that if certain backups fail, you get an email, and IIRC, there's some form of automated response you can do, like rekickoffs, or something.
Unfortunately, and I may have mentioned this up top, but I can't recall, we were not really able to do any of the active stuff in an hour's time, so we ended up focusing on the reporting. Still very cool.