Wednesday, April 21, 2010

Lefthand SAN + HP = :(

My work place was an early adaptor of the iSCSI SAN technology. We started with Lefthand Networks back in 2005 and were very pleased with their products and continued to to grow our cluster adding a node or two every year which increased performance and storage capacity. This worked out great since our budget didn't allow us to spend 100K on the other SAN technologies available at the time. We could get the small 1 TB raw capacity NSM 150/160 SATA units for ~13,000 at the time.  We used it for simple stuff like file servers but eventually moved our Exchange 2003 DB's to it after our mail servers local storage got low ( 2 DB's supporting about 2500 mailboxes and it ran like a dream). Thin provisioning and dynamically expanding drive size is pure win ( if you've never used it you really don't know what you're missing).

Eventually we deployed Vmware ESX and our SAN storage was suddenly looking pretty full.  We started another cluster of larger storage units (the NSM 2060 3 TB SATA) for about the same cost per unit ~13,000. These also changed from the custom Lefthand chassis for the hardware to basically a Dell 2950 chassis. We're primarily a Dell server shop so we were thrilled and thought maybe a purchase would happen by Dell.

Fast forward to fall 2008 and HP's purchase of Lefthand.  I'm personally not a huge fan of HP servers so I had some worries about what was going to happen but was optimistic. With any transition there are bound to be some bumps in the road but HP made some exceptionally stupid moves like taking down the Lefthand networks customer portal ( where I got all my tech notes and patches ) before they had an equivalent site up on HP.com.  This came at a particularly bad time for us as we needed to get some patches to complete an upgrade to our cluster. Lefthand tech support was always excellent and you usually talked to the same 4 or 5 people when you called in. They were very helpful during the transition period and still got me the files and info I needed. I got the upgrade completed and forgot about the HP situation until it came time for us to buy another node for the cluster.  Since Dell is a competitor, HP quickly changed the chassis for the node we were using to a DL380. The price of these units continued to drop so this wasn't a real big deal 3 TB raw iSCSI SAN for ~10 K was a pretty sweet deal. Now we have a P4500 as well as the 2060's. We also purchased a cluster for our DR site to get some remote snapshots going, 3 P4300 6 TB SATA nodes still very affordable at about 12K each. While in the testing phase we had a hard disk go bad ( maybe the 2nd disk out of all the nodes in our clusters since we've had the things, which is impressive for SATA disks, and also a bit sad for HP starting out). We also came to find that snapshots would cause performance issues in the cluster if you had too many per volume ( I think we had about 4 per volume on about 30 volumes ), we started having managers go down and I/O issues. Luckily we had enough redundant managers that nothing went unavailable but performance was a real issue until we got the snap shops cleaned up. This was also the first time we had to get in contact with support since HP changed things. We were waiting for call backs from engineers instead of getting right through to tech support like in the good ole days.  The same staff seemed to be there in the end though so our problems were resolved. Once we got a daily snapshot schedule set things seem to run like clockwork.

Most recently we got struck by a firmware problem with the RAID controller in the HP chassis. Again the Lefthand redundancy saved the day when a node went down. A RAID restripe took place for about a day after we flashed the firmware and got the bad unit back up. We also recently noticed we were bumping the performance ceiling on I/O and had to shuffle some stuff around between our two clusters. This brought us to the point of ordering another node this year.

Welcome to confusionville, population me and probably every other customer of Lefthand SANs. Lefthand has always had a weird reseller program called preferred vendor pricing. This basically gives the reseller that you first opened your account with Lefthand through better pricing than other resellers ( unless they apply to change the preferred pricing to their company which than causes your initial vendor to question why they lost the preferred pricing). This creates problems if you ever try to get competitive quotes from multiple vendors. On top of this I came to find out effective 3/31/2010 HP is discontinuing all SATA SAN models and also the 3 TB capacity that our NEW cluster is based on. Had a meeting with our HP rep and our local SAN engineer where they basically told me they're going to offer slower speed SAS drives that are equivalent to the SATA disk drives but since the capacity chassis that I use is no longer offered the best I can do is buy the larger 9 TB chassis and just not use the excess capacity. Grrrrr. Option B is to start another new cluster. This kind of hits at a good time because I'd been considering starting a cluster of SAS storage for our higher I/O apps like VMware and Exchange 2007 ( now running 4 DB's for the 2500 mailboxes) plus we're looking into VDI which I hear can beat the hell out of SAN I/O.

I'm quite frustrated with HP that they've kind of screwed me with my mid level cheap storage so I started looking at Equallogic from Dell. There is some chatter out on the internet comparing these two competing SAN vendors. This site is collecting all the info together and is very helpful if you're trying to make this decision yourself. The management and monitoring tools of EQ I think look better than what Lefthand currently has to offer. EQ has software to collect and allow reporting on performance history whereas lefthand basically just has current performance metrics available in the console but no history.  I'd give my left hand ( lol) to get HP to rewrite the console in something other than java. Maybe make a web interface that is clientless ? It takes like 3 minutes to open and log into my management console as it goes out and collects config info from all of my 14 nodes in one management group. I can't imagine how long it would take in a really large environment.  Price wise the EQ and Lefthand are looking about the same for performance but you may be able to eak out more usable space by changing RAID types in the EQ. As a long time Lefthand user I'm pretty comfortable with the network RAID they use and the redundancy it gives if an entire node goes down. I don't know how I feel about all my SAN being in one box ( even though EQ claim everything is redundant ). I will say that the maintenance on all my Lefthand nodes is getting kind of ridiculously expensive but those 9 NSM 160's are reaching end of life so it might be time to drop it and consolidate to a bigger higher performing disk unit. Below is some pricing I've gotten recently and the supposed I/O that the units provide. Hope this help someone, I'm just kind of brain dumping here to help me make a decision on sticking with the enemy I know or moving to the enemy I don't plus having to migrate all the data to a new SAN and buy another EQ unit for our DR site as well......... this is sounding extra expensive. Ah if only this had happened before we established our DR site.

This pricing may vary as our Dell rep was jumping through some hoops to get us pricing to fit what we had budgeted for ( which was not EQ)
EQ PS4000XV 15K SAS 16x600 GB - ~38 K  ~ 1800 I/O*

EQ PS6000XV 15K SAS 16x600 GB - ~50 K ~ 1800 I/O*

* I'm not quite clear on the differences between the 4000 and 6000 series and didn't get a real clear answer about the I/O provided by the 6000 but logic would dictate that the same number and speed of disks in each unit would provide about the same max I/O

Lefthand Virtual SAN bundle

2 Nodes P4500 G2 12 x 450 GB SAS - ~52K ~ 3400 I/O

You can buy single P4500 to add to this cluster to increase capacity and I/O. HP really seems to stick it to you with the support cost for 3 years for this one being almost 6K vs about 3K for the bundle package listed above.

1 P4500 G2 12x 450 GB SAS - ~34K ~1700 I/O

From a storage perspective it looks like EQ would be the way to go but from an I/O perspective I can get more I/O for my dollar from Lefthand which is not what I was expecting.  The tools and optional RAID configs that EQ offers may make up the difference but my established Lefthand environment and previous investments may overcome my current loathing of HP and keep me as their customer.

12 comments:

Darkwynn said...

I don't know where you got the 1800 numbers but i have seen at least our EQL san's do close to 6000 I/0 in the XV formfactor.

nate said...

All info was given to me by our Dell/EQ Rep. I understand that EQ's RAID options can drastically change performance per volume but I had to go with the numbers the rep gave me.

Unknown said...

It was scary to read your post as we are in almost the exact same situation! Currently have 6 Nsm 160s and looking at both the Eql or hp? I'm very interested in your final decision and any further insight you may have.

Thanks
Kevin

Unknown said...

It was scary to read your post as we are in almost the exact same situation! Currently have 6 Nsm 160s and looking at both the Eql or hp? I'm very interested in your final decision and any further insight you may have.

Thanks
Kevin

Anonymous said...

What's happening with those 160s when they go EOL? I might be interested in buying up your sloppy seconds since we're even poorer than you guys.

Unknown said...

Read your post with interest. We've been running LH since mid 2008, just before the HP takeover, and have experienced exactly the same issues you have.

We've re-evaluated the Dell EQL kit, and we've decided not to buy any more LH for current or future expansion. Instead we're migrating the whole primary SAN to EQL. We'll take the LH and run it in DR site only, until its time to retire it completely.

Dell EQL are a lot better on price these days. LH TCO,and cost per TB of storage has been much higher than we imagined.

the LH scalability model of adding more storage nodes into your SAN cluster for 'pay as you grow' scalability - as a way of achieving linear growth (increased capacity availability and IO), have some serious caveats also. Maybe we were naive but LH dont make that terribly clear, and only when we went to expand did we really begin to realise the implications of that.

nate said...

We stayed with HP/LH mostly due to the total investment we're already into them for ( primary site plus our DR site). Further poor support may prompt us to forklift everything to another vendor though. Storage industry has expanded a ton in the last 5 years since we started with LH. It's a shame that HP is driving a decent product into the ground. If we change storage vendors HP will not be in the list of companies to evaluate based on the way they've handled LH.

@amal, we generally don't resell equipment but salvage it due to privacy concerns of data that once resided on it.

vsphere5 said...

Hi
We are experiencing I/O performance issues also with nearly the exact setup as yourself. We also have 4/5 snaps per volume on about 30 volumes. Can i ask with respect to snapshot procedures what you implemented to limit performance impact as outlined by yourself?

Any help is much appreciated

regards
Neil

Unknown said...

We are in the process of moving from LH to EQ. We have been a customer since 2004/2005 for LH. The HP acquisition has caused poor support issues and reduced value added services with tools like the LH Health Check tool that was supposed to allow support to check my logs daily and call me if patches were needed. Dell calls me immediately if there is a known bug or if parts issues. Heck I had two power supplies show up once and didn't even ask for them to be ordered because Dell's service is proactive rather than reactive. I would be willing to sell my extra NSM160s..I have upgraded them to 4TB NSMs

nate said...

@delrop LH support told me that keeping many snaps on many volumes can create high I/O and overload the storage (great to know after you've bought their product right?). They recommended keeping the minimum snaps per volume. We keep 2 per vol in the primary cluster, I suppose you could keep more in a remote cluster.

One thing I was impressed about when equallogic was showing us their demo unit was the perf logging and the ability to go back to a point in time to see what might have been going on in the unit that caused a performance issue. This feature would be very valuable. LH event log is only viewable to me if I leave the console open all the time. Support can pull up past logs but perf logs aren't part of that I believe unless they generate some sort of event log.

Pete said...

You might want to take a look at Scale Computing. Cluster based storage like Lefthand with much better performance and much more affordable. Many of the old Lefthand support personel have moved to Scale.

scottyextreme said...

I was a LHN partner who used IBM X series deployments. it was nice of HP to discontinue all support for these nodes. They dont even have a dedicated website for LHN. Makes you wonder if they purchased the company for some of the software technology and to buy out competition.

Try calling pre-sales some day. They have no idea. Best thing going is VSA so i can choose my own hardware, too bad its only for small deployments...