issaferret: (Default)
[personal profile] issaferret
No 'net tonight, can't be arsed to register again. Besides, mebbe it'll encourage me to sleep sooner. Thus, my con report is a little early.

After [livejournal.com profile] karenbynight mentioned she was reading the report, I expected to get audience-phobia. Good thing I'm forgetful.



Day three started with even more dragging - barely made it into session one.

First session: Veritas Opforce. Opforce is a 'provisioning solution' - which comes out to mean an enterprise class version of Solaris' Jumpstart, for multiple platforms. It's not only capable of doing initial server installs with all the customization you want, it can do live-server provisioning - anything from installing your patch clusters automatically from your Opforce console to throwing a package onto a remote server while it's still running. I'm not suggesting all this is magically part of the initial package - to be honest, I don't know - but being able to do it would be very nice no matter the initial cost. I created something to do the install patches thing and it took me a lot of work and was very hackish. I'd like someone else to do the work.

Opforce understands network hardware as well as your server hardware, so you can set it up to provision your system, then go into your loadbalancer and/or switch and change the VLAN the server's on, or whatever you like.

The case study offered was a big DR solution. The company in question - Grainger - set up a site in Chicago as production, and a site in New York for failover in case of disaster. The new york site, for daily purposes, was set up to run the Test and DR networks, so the hardware wasn't wasted. Their allowable Disaster Recovery time was a relatively lax 8 hours. Veritas claimed they could get 200 servers installed in 120 minutes, with the right configuration. That'd give 6 hours for the app folks to pore over the results and make sure everything was ready for prod.

They set up to snapshot the production systems regularly and use VVR or something similar to replicate the snapshots over to the Opforce servers in New York. All in all, an intriguing chunk of software. With that and Command Central Storage you could do some cool centralization of management. Good for huge systems. Depending on cost, overkill for us. Still bloody awesome.

The morning keynotes (fuck, two of them in a row? What stoner thought _that_ shit was a good idea, my ears would start bleeding from the business-speak) were of no interest to me, really. I stopped in to see what the Symantec CEO wanted to say about the direction the company was going, and he was blathering about how important a wakeup call the Slammer worm was. ... Given my interest in Windows problems caused by shitty firewalling and huge security flaws, I wandered off to read my webcomics, figuring it was more productive use of my time.

On a side note, I *love* how all the sessions have names that give absolutely no indication of what products they're pushing. Makes it real easy to learn more about particular tools.

Okay, next session. I started out in a session that turned out to be a 'Duh, what's clustering?' session for techies. Moved over to a 'What's the next-gen clustering app going to do?' session.

One of the big themes at the conference is under-used servers. On average, apparently, utilization sits at around 15%, because Unix and Windows suck ass at making sure applications play in their own sandboxes without crapping all over everyone else.

Well, most Unices are working on getting better at that. Solaris 10 in particular has attempted to implement strict resource consumption controls as well as something called 'Containers', which I'm not sure of the explicit capacities of, but they are supposed to bring mainframe-style zoning to Unix. Assuming it all works, one should be able to run multiple applications on a single system. That implies that you should be able to _fill_ any given system relatively easily. If you're planning on getting really good use out of many small applications, which _may_ have incompatibilities, you're going to have to be able to do some very intelligent failover on your cluster. The next-gen beast is intended to understand zoning tools like the solaris containers and resource managers, and allow you to give some parameters so that you can let the cluster manager decide what to do when a failure occurs, and if sacrifices need be made, what services to let stay down.

So, all in all, it sounds good on paper - and looks good with the JAXP web interface they were demoing, which is apparently a current stable rev of the Beta of the simulator they're releasing sometime next quarter. My questions and doubts about it lie more in the realm of the Solaris containers; I'm going to have to look into them and see what the hell they do and are capable of, so I can see how much to trust them.

Guess it's time to get my coworker to jumpstart a Solaris 10 test box, so we can play.

Lunch, then a keynote on application management. Veritas I^3's the product dujour - a nifty kit of tools for observation and debugging. With several different modules to plug into the parts of your applications, you can have it inspect your webserver, middleware, and database backend to see what webpages, which servlets/server pages, and which sql transactions are taking up most of your resources, and why. The demo he had an engineer do (the engineer was kept carefully on a leash) walked all the way down to the sql statement level, and showed that the sql statements in question were bogging heavily on I/O wait. Then he stepped one further and showed that it was _just_ that partition having the problem, indicating that if the load were spread to more disks, say, a round-robin access mirror, or somesuch, the performance problem could go away. So, a very nice, polished-looking tool. Supposedly it does all this without significant performance hit, and keeps historical logs and suchlike. Assuming it works like the demo does, it's what I usually wish I had as a programmer, except with a gui to make it suck even less.

San Francisco has decided to get cold and rainy on me. The psychological effect of the rain outside has sent me running back to my room to get my trenchcoat.

Second to last session of the day - went to a session on J2EE performance pitfalls and how Indepth (part of the Inform, Insight, Indepth I^3 trio) for J2EE handles it. It comes out to an 'adaptive' instrumentation kit - it goes through and watches the java kit for a period, figures out what needs to be watched closely, and jams instrumentation in based on a set of rules. The goal it has is to diagnose problems in a live production environment without disrupting service or degrading response time. Not perfect yet, apparently, but it does all it does without a restart of the application, and can do quite a bit of reconfiguration to the survey results. All in all, seems like a good visualization tool.

I've spent a fair amount of my time doing the pie-in-the-sky speculation thing - is this or that new tool something we should pursue? kinda questions. I don't think it's a waste of time at all, but since half or more of my time was devoted to these investigations, my learning of the existing tools was somewhat retarded.

Of course, it doesn't help that there're few things I can't intuitively understand about the existing tools, and few pitfalls in my understanding that I can find without a production experience with them.

Last session of the day, and second to last meaningful session of the conference, a tech-talk about Veritas Volume Replication. I can't miss that. Not even my coworker going to that one (it's that stupid Con culture thing where you're supposed to split up to cover more ground) can make me retreat.

Good thing, too. Either he went to one earlier, he missed this one, or I just didn't see hom. VVR is pretty cool, but there're some big questions. He answered the big ones as best he could in general. The big one, his favorite question: how much bandwidth does it take to replicate an Oracle DB, was quickly answered. His catchphrase: "How long is a piece of string?" The answer being no idea, till we measure it. So Veritas has a tool for modeling bandwidth and replication log requirements based on live application data. Good. We need that.

VVR runs on UDP, primarily, for the lower overhead... its replication groups, which all share a single replication log, are designed to be allocated per-application. Trusting any estimation of data usage other than their modeling tool is a Bad Idea. For example, your DBA says this or that many writes, they're pulling it from Oracle data, and Oracle lies. Or rather, it doesn't know the whole truth.

VVR initial setup was the other Big Item. You can create it from scratch, hand your replication setup a bunch of blank copies, but if you have a 10TB database, you'll be waiting a long-ass time for consistency... assuming it can ever finish. So, "never underestimate the bandwidth of a station wagon full of tape", he says. You can configure your SRL, checkpoint when you start and finish making your (_raw-disk only_) backup of your volume, then truck tape over to the new site. When you finish the restore, attach the volumes to the replication scheme and let 'er rip, and the changes since your backup started will be synchronized in.

Obviously, once the backup starts, you seriously need to watch the SRL, because if it overflows, you're probably going to have to restart, and you'll feel pretty stupid if you do.

Day three, over. I may be going to HoNK (hm, that's entirely too degrading an abbreviation, I won't use it again) with my coworker as the token 'doing things in the same room' thing for the event. I declare an exhausted victory.

Date: 2005-04-29 12:58 am (UTC)
From: [identity profile] triss.livejournal.com
I can easily believe that a user's desktop can get it without any problems. What I have severe problems with is production applications succumbing. Particularly, he was focusing on Slammer, which was a worm affecting SQL server specifically, IIRC. I believe that one should be able to construct a network architecture that would disallow wacky shit hitting your SQL server like that, and focusing on virus-scanning to solve Slammer is treating a symptom. I may be wrong, but that's my take.

Date: 2005-04-29 05:47 pm (UTC)
From: [identity profile] spaghettisquash.livejournal.com
That's why we have DMZs. Gah. (I don't actually know how slammer works, so I might be wrong here.)

December 2016

S M T W T F S
    123
4 5678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 5th, 2025 10:36 pm
Powered by Dreamwidth Studios