Tuesday, March 8, 2011

VMware announces vCenter Operations

"VMware to Integrate Performance, Capacity and Configuration Analytics to Streamline Management of Intelligent Virtual Infrastructure."


-Yahoo! Finance


What does this mean to you, VMware admin?

We can all admit after many years of working with VirtualCenter and vCenter, that there are some things left to be desired, where we depend on third parties to come in and tell us things VMware should already be able to tell us about "their/our" environment.


vCenter Operations appears to be bridging that gap.



From Yahoo! Finance, here are the main coverages:



  • Proactively ensure service levels in dynamic cloud environmentsReal-time performance dashboards with patented analytics and powerful visualization of the health of the environment will allow IT to proactively pinpoint performance issues and risks before they become problems and impact SLAs. 
  • Get to the root cause of performance problems faster 
    The combination of patented analytics and infrastructure-awareness will allow vCenter Operations to more accurately and rapidly determine symptoms so that infrastructure and operations teams quickly get to the root cause of performance problems. By enabling a more collaborative approach, vCenter Operations can speed problem resolution and change management cycles and reduce manual efforts by 40 percent. 
  • Optimize deployments in "real-time" to enable self-service provisioningvCenter Operations will provide real-time analysis of performance and capacity to help teams make fast, informed decisions on deployment. This capability will be critical to enabling rapid and reliable provisioning needed in self-service environments. 
  • Maintain compliance in the face of constant changeAutomated provisioning and configuration analysis will ensure optimal configuration by automatically detecting configuration changes and enabling rollback to help IT maintain continuous compliance with operational best practices and industry or regulatory compliance requirements. 


Further:


vCenter Operations Pricing and AvailabilityvCenter Operations is designed as a set of products and solutions that will bring together the performance, capacity and configuration management capabilities VMware has developed and acquired, including VMware vCenter CapacityIQ™, VMware vCenter Configuration Manager and Integrien Alive™. vCenter Operations will be available in three editions to meet the needs of customers managing both VMware vSphere-virtualized and physical environments.
  • vCenter Operations Standard offers performance management with capacity and change awareness for VMware vSphere-virtualized and cloud environments.
  • vCenter Operations Advanced adds more advanced capacity analytics and planning to vCenter Operations Standard's performance management for VMware vSphere-virtualized and cloud environments.
  • vCenter Operations Enterprise offers performance, capacity and configuration management capabilities for both virtual and physical environments and includes customizable dashboards, smart alerting and application awareness.
The first versions of these editions will be available in late Q1 with prices starting at $50 per VM. vCenter Operations will be available through VMware sales and via VMware's more than 25,000 channel partners.

Friday, March 4, 2011

Oracle on VMware...YES AGAIN!

So, here we are, early 2011, and we're still having conversations about whether or not VMware is capable of handling high-end, mission-critical, Oracle workloads.  Really?!  *sigh*

I read this post from Chad Sakac yesterday (Chad, not calling you out, just using it as reference to show vendors are still "selling" the "idea" of virtualizing Oracle!) and realized that this battle of ideals is still very new and fresh, even though it isn't.

I'd like to think I've become one of the bigger offensive players in the virtualization of Oracle on VMware, but that's most likely my own fantasy.  Realistically, maybe I'm a bit more like Michael Moore....a mostly unknown, annoying fat man running around poking everyone with a stick publicly.  Yea.  That sounds more like it.

Let's review what has taken place the past couple of years in this space (and please understand I'm glossing over here...)

In the beginning, when we first truly started talking about virtualizing tier 1 workloads, SQL was the easy kill (which should have been a sign, but people chose to ignore the correlations between it and Oracle) because being a Microsoft OS/app made it fairly easy to P2V, and run in a virtual environment.  For the same reasons, and with the performance increases we saw in vSphere4,  Exchange was also a pretty easy victory because, again, it was essentially a Microsoft app. Easy to P2V, configure for multiple drives, external storage (iSCSI/RDM LUNs), etc.  What was mostly overlooked was Microsoft's generally open, accepting stance of virtualizing and their being quick to define standards and "supported configurations" that their apps could be run on.

Then we came to Oracle.  SCREEEEECH!   The brakes got put on.  What happened?!  Father Larry and the Oracle Marketing Machine went to town.  And it worked.  Fear mongering, support statements (or ambiguously dancing around them), as well as their complex multi-core multiplier licensing format, did not bode well for running software in ANY sort of virtual environment.  With very open eyes, a few of us saw through this as they introduced their own product, OracleVM, and launched what turned into one of the biggest fear campaigns I've ever been a part of.

What that has ultimately led to is YEARS of customers asking the same two questions:

1) "What about support? Oracle doesn't support VMware."  (truth: Yes, they do.)
2) "Yea but if we use VMware, we have to license the whole host!"  (truth: This is a GOOD thing!)

...and those questions ultimately lived on because Oracle just sat back and let it spread like a virus without directly answering them.  Way to go, Oracle sales & marketing.  Well, maybe I'm saying that pre-emptively, because technically, wouldn't you have seen explosive growth in OracleVM as a product?   Yea.  We haven't.  Oops.   Inadvertently, what you have done, is scare everyone off of virtualizing the Oracle software stack altogether!  Regardless of hypervisor, soft or hard provisioning....you've killed it, buried it, and those of us activists out here are left with the task of ressurecting the idea and bringing it back to the masses.  So, against the grain, the rumor mill, peer pressure, co-workers doubting, long-time DBA's laughing, I have found that I am typically the one with the last laugh.

Why?

Example.  I attended Oracle OOW2010 on behalf of VMware to speak in their booth in the Expo specifically as a customer reference.  It was amazing to me how LITTLE we talked about virtualizing the Oracle stack, and how much Virtualization 101 I did.  It was a shocking revelation.  None of these DBA's know about virtualization?!**  Holy crap!

**I say "none" loosely, but it was definitely a majority of conversations.

Self:  "Well, if they don't know about virtualization, that means they don't know about all the "built-in, bolt-on" things like HA, vMotion, Storage vMotion, Fault Tolerance.  Why are we even talking about support and licensing?!  We need to be EDUCATING so that they KNOW the benefits, and don't think they are just swapping one complex framework/infrastructure for another one for no gain!  Holy underhanded sales tactic, Batman!"


And therein lies the point of this post.

*Let us stop focusing on what Oracle has spread as the fear mongering.  (support/licensing)
*Let us all stand up and start focusing on the GAINS you get automagically just by simply virtualizing a workload, Oracle or not.
*Let us step back and re-focus on core fundamentals and features when talking about Tier 1 workloads, because they (for the most part) have somehow been lost in the FUD/fodder/dustcloud of new release of addons and "the next big thing," whatever it might be that week.

Sell the PRODUCT, and the idea will sell itself. You won't have to have the arguments anymore.

For the record:

1)  Oracle DOES support their software stack running on VMware. PERIOD.  They do not "CERTIFY" it.  They also do not CERTIFY it on most hardware vendors that it runs on in the physical realm either.  This is not game changing at all.  It is pure fear mongering in an effort to steer you to their OracleVM product.

At worst, and I mean WORST case, you will be asked to reproduce your problem on a physical platform.  Don't you already reproduce your environment regularly for test/dev anyway?  This isn't as huge of a task as most people make it out to be.

2)  Oracle requiring you to license the entire host is a GOOD thing!  Because guess what?!  Once you do, you can run an unlimited number of virtual machines on that host(s)!  And if you only run ONE virtual machine on a cluster of two hosts, well, thats exactly what you would be doing in the physical world as well, right?  Think about it.  No really.  Think about it.  You're getting WAY more bang for your buck.

As a side effect of this, you also get, OUT-OF-THE-BOX, VMware HA, vMotion, Storage vMotion, Fault Tolerance.  Doing away with all of that complicated software (*couDATAGUARDgh*) to make your backups and redundancies.

I'm not in sales.  I don't have to sell this, because this sells itself, if the customer is educated properly by the sales teams.

Vendors, resellers, et al.  Please help steer the customers away from the Support/Licensing discussions that are non-issues, and get back to core fundamentals of virtualization, and what they get just by simply virtualizing a workload.  Throw in the idea of image-level backups, snapshot-based DB backups that take seconds or minutes instead of hours, and you'll see their eyes light up and ear-to-ear grins emerge on their faces.

How do I know?

I saw it countless times at Oracle OpenWorld, and have continued to see it and be thanked for "showing them the light."




"RAC on VMware dramatically optimizes every aspect of the workload's product life cycle. It is the Cadillac of high availability solutions."


- Dave Welch, House of Brick's CTO & Chief Evangelist  source

Tuesday, February 15, 2011

Mount all Datastores with Powershell

A couple of weeks ago, I ran into a problem.  We were re-designing our networking in the datacenter so that all ESX hosts exclusively used 10GbE interfaces.  Since our underlying foundation is Layer 2 Nexus 5k switches, we have become huge fans of twinax cabling, and not having to buy additional SFP's to make the interfaces compatible with other gear.

When we cabled the first one up, I figured I would just reconfigure the networking manually since the vmnic# assignments would be different, and then use Host Profiles to reconfigure the rest of the host to mount all NFS datastores, set cpu/mem limits reservations, and configure firewall rules.



What I learned were some hard lessons about Host Profiles.  They've got a long way to go.  If you're adding an identically cabled host to an already existing cluster, it works like a champ.  (of course, it won't do some discreet things like set vm swaplife location)  It would not allow me to NOT configure networking.  And what I mean by that is, it would not give me any sort of advanced route to get the host online, configure networking properly, and then tell the host profile to completely disregard networking.  I ever went as far as attempting to create the Host Profile, and then manually removing all networking configuration from within the profile.  When I tried to save it....

"Sorry, you must configure networking for this profile..."


For those of you that play WoW, this was a /facepalm moment.


So, I was out of ideas.  How in the world could I get all my datastores mounted?  The rest of my host config was simple enough, but I didn't want to sit there and manually mount 50+ different datastores.


I reached out to a couple of people, and ultimately Erick Moore (@erickMoore) came up with the solution.


Powershell.


If you don't know Erick, you need to be following his stuff.  He is one of the people who have taken a lead in Powershell scripting for NetApp and virtual environments, and threw a script together quickly for me to accomplish this task.  Here is the script:



------------------------------------------------------------------------------------------------------------


$vcenter = "vcentername"
$dataCenter = "Your Datacenter"
$vSphereHost = "FQDNHostName"

Connect-VIServer $vcenter
$vSphereHost = Get-VMHost $vSphereHost

#----{ Get all NFS mounts in vSphere datacenter
#

$nfsDS = get-datastore -Datacenter $dataCenter | where {$_.Type -eq "NFS"} | get-view | select Name,@{n="url";e={$_.summary.url}}


#----{ Parse NFS mount info and mount all datastores to specififed vSphere host
#

Foreach ($ds in $nfsDS) {

 $nfsPath = $null
 $i = 4
 $nfsInfo = $ds.url.split("/")
 $dsName = $ds.Name
 $nfsHost = $nfsInfo[2]
 Do { $nfsPath = $nfsPath + "/" + $nfsInfo[$i] ; $i ++} Until ( $i -eq ($nfsInfo.count - 1) )
 
 New-Datastore -Nfs -VMHost $vSphereHost -Name $dsName -Path $nfsPath -NfsHost $nfsHost -WhatIf

}

----------------------------------------------------------------------------------

Basically what happens here is the script grabs all datastores mounted at the "datacenter" level (this could even be modified to the "cluster" level) and attempts to mount each of them to your new host.

Simple. 
Brilliant.

Erick, thanks a million for getting me over this hump.  Worked like a champ.

-Nick

Thursday, February 10, 2011

ESX/i 4.1 + HP NC522SFP+ = PSoD's.

We made a decision recently to exclusively cable all of our ESX hosts via 10GbE.  We had a ton of extra cables laying around, but would still need to order some to fulfill the complete 10GbE package.  So we order them, and off we went rebuilding the environment.

First two, aside from some issues with Host Profiles (which I could rant about for hours), went flawlessly.  This was because we had enough of the older style cables.  On the third host, I had to start using the new one's we had ordered, which, while nicer looking, did appear completely different.

(i.e. silicon board inside is red instead of green, pull tabs different.  Aesthetic stuff like that.)

@Cisco: I would love some definition as to what is different between these two.  The only different numbers I can find are:

Old:    37-0961-01
New:  37-0961-02
Both are SFP-H10GB-CU*M (* = length of cable in meters)

Anyway, I plugged the first new one in, no link lights.  Hrm, maybe a bad cable. I'll try another one.  No link lights.  OK, third times a charm.  Nope, no link lights.

Started digging and apparently, you have to upgrade the firmware of the HP NC522SFP+ 10GbE adapters in order for these new cables to work.

Again, @Cisco..... why is that?  What changed?

OK, not a huge deal.  Go out to HP's website, grab the latest Firmware Maintenance DVD ISO, which was 9.20B, burned it, loaded it, everything went fine.

"Wait, why does it say QLogic now instead of NexGen..."

POST process goes fine, and ESX starts to load.....it gets to "networking-drivers..." and BANG.  PSoD.

Crap.

Let the research begin.  I hopped onto HP's support site, and initiated a chat session online with one of their techs.  Kudos, because after about 10 minutes, this guy found the issue.

HP Advisory Link

^This is the advisory.  It's pretty long-winded, so I thought I would sum up what the problem is here in a TLDR version...

First, let me be clear:  Everything worked famously up until this point.  I had zero problems from the Cisco twinax cables, Nexus 5010's, HP NC522SFP+, and 10GbE NICs.   The trigger point was Cisco changing the cables, requiring the upgrades.


What the advisory says is:

This occurs due to an incompatibility of the VMware ESX/ESXi 4.1 in-box Qlogic 4.0.550 driver with the Qlogic 4.0.520 (or higher) NIC firmware installed on an NC522m or NC522SFP Gigabit Server Adapter.


I can confirm as of today that this does only affect 4.1 and up.  I successfully updated the firmware today on some ESX 4.0 boxes and they worked flawlessly.  There is an obvious mismatch in the ESX driver in 4.1 and the HP firmware.  I waited to post this until today to test this, and also that 4.1U1 was being released.  Apparently, there's no update to the driver in U1.  Seriously, guys?!  :\

So, how do we fix this?

You need to download a copy of the HP Firmware DVD Bundle 9.20B (iso).  Download Link
You need to download the USB utility to create a bootable USB from the above iso.  Download Link
You need the latest custom drivers from QLogic off the VMware site.  Download Link
You need the latest firmware (as of today is 4.0.539 (15Dec2010)) from HP.  Download Link

You will also need ESX/i 4.1 install media.  I'll leave this one to you to acquire based on your license level.
You will also need to burn the QLogic drivers ISO to disc for when we reload the OS.

OK, got it all together?  Good. Let's go through the motions.

1)  Unzip the firmware.9.20B.zip so you can get to the ISO file.
2)  Install the HP USB key creator.
3)  Run the HP USB key creator, and when prompted, point to the ISO file where you unzipped it.
4)  Once the USB key is created, browse the folder structure on the key and look for subfolders "/hp/swpackages"
5)  Once there, paste the .scexe file from HP firmware into /hp/swpackages (I believe the exact file name is CP14007.scexe).  This will not overwrite, it will just be an additional package.
6)  Put the ESX host in maint mode (at this point, it's useless anyway) and boot it off of the USB key.

I would hope it goes without saying, but in an effort to be pedantic, your NC522SFP+ cards must be installed in the server in order to receive the update.  If you add additional new cards after the fact, you'll need to repeat this process to update them.

7)  Choose INTERACTIVE UPDATE when the load screen appears.
8)  Select the top selection titled "ML/DL 300/500" and at the bottom, check the two boxes that say, "ALLOW NON-BUNDLE" options, and leave the "FORCE" option unchecked.
9)  It will go through an inventory process determining which packages need to be installed.  If you've placed the .scexe in the right place, you will see an option for which one you would like to install.  By default, it will select the most recent one, which is what we want.
10)  Leave the defaults, and click INSTALL.

Reboot the host when prompted (remove the USB key) and insert your vSphere 4.1 media.  Install as usual.  When you get to the screen where you're asked if you want to load custom drivers, choose YES, and insert your disc with the QLogic drivers in.  Click OK.  You should only see one option for the nx_nic.  Select it, and click OK.  Leave it in and continue on with the install process.  You will be prompted when you need to re-insert the ESX media.

You should be good to go at this point.  Finish your re-install.

HARD-NOSED CUSTOMER OPINION

This is a PAIN IN THE A$$ process, and HP, Cisco, AND VMware are all accountable here.  This simply cannot happen.  You three are some of (if not THE) the biggest players in this arena, and the simple fact that this could slip through the cracks is unacceptable, guys.  QA your stuff.  The fact that this is STILL not fixed after first emerging last summer is very telling of your lack of communication and working together.

-Nick

Sunday, February 6, 2011

Onward and Upward

I first wanted to reach out to my readers and extend an apology for not writing recently.  Once you've read through this post, you'll hopefully understand the reasons as to why.

Life is a book, and in every book there are chapters. Sometimes the lines between those chapters are a bit blurry, and sometimes they are abrupt stops and starts.  In the analogy of life, I suppose you could chapter certain things, such as milestones or firsts, or turning points that steered one's life in a certain direction.  Remember those "choose your path" books?

Well, I'm turning over to a new chapter.

This morning, I am tendering my resignation to my employer, IPC - The Hospitalist Company.  I'd like to state here that my time (~5 years) with them has been superb. I was given an incredible amount of leash to experiment and learn, and was able to work amongst a great team.  I was given wonderful examples of leadership, for better or worse, and have learned much about working for large public enterprises in the IT space.  Hopefully I was able to provide that same level of leadership and example to my peers coming up in IT. 

I've honestly been writing this post for about two weeks now, never knowing exactly what to say.  When you work for someone this long, they become family. It becomes comfortable.  And whether you are working like a well-oiled machine, or at each others' throats over the latest "issue," at the end of the day you all work together to resolution and move on from it.  My time at IPC has been exactly this.  We all have our ways, and we learned to work together through those ways.  It's what makes a team a team.

To my peers, my superiors, and the IPC Executive team, I only have two words:  Thank you.

To Ren, you've been a wonderful teammate, partner-in-crime, and have become a close friend.  You were always patient with me, and together, we grew the enterprise into what it is today.  Take care of our baby.

"Yes, Nick, enough with the pleasantries! Where are you going?!"

In life, we're often thrown opportunities that we either grab on for dear life and hold on for the ride, or allow them to whiz past us.  Over the past 6 months, there have been many offers thrown my way, most of them not worth the time, some of them significant.  I always knew in the back of my mind who I wanted to work for, however.  And I suppose subconsciously I was holding out, even with them, for the exact position that was...well, not so much perfect for ME, but perfect fitment of me for THEM. 

It finally came.

I will be joining the NetApp team in RTP as a Virtualization Solutions Architect.

I'm excited about the opportunity to take my hard-nosed opinions of a customer, and directly put some influence into the product design.  I'm proud to be wearing the NetApp badge on my sleeve (or polo) and representing them anywhere I go.  I'm extremely grateful to them for the opportunity, and yet humbled by the wisdom of the people I will now be working with. 

To those of you in RTP, I'm looking forward to finally meeting and collaborating with all of you.

To Vaughn:  Thank you for your patience, your candor, your persistance, and most of all, your support.

As far as this blog goes, that remains to be determined.

Monday, November 29, 2010

Exchange migration project (Part 1)

One of the bigger projects of my tenure at my current employer is firing up.  In this series of posts over the next couple of months, I'm hoping to highlight and document our methods, as well as go over any hurdles we run into, as well as how we resolved them.  This initial post will serve only as an introduction to the environment, and the overall plan of attack.

Currently our environment's email has always been hosted on an external mail host.  It has been this way for 10+ years, and unfortunately, we have just outgrown them.

Internally, we have been planning for this, and have turned up a new Exchange 2007 environment (yes, it's virtualized), and it is already hosting about 2000 mailboxes to some happy end-users that we migrated from a shoddy freeware platform called hMail earlier this year.  For them, OWA 2007 was a night-and-day difference to what they had before.

What we are now targeting is the corporate/regional backoffice staff that, while fewer in number, are the heavy hitters, with mailboxes ranging in size anywhere from 2GB to 40GB.  And while this is the second phase of this migration as a whole, this phase has many, many "sub-phases."

What I'd like to discuss initially is sub-phase 1, and the hurdles, and how we overcame them.

When dealing with Exchange, it's a fairly straightforward process to move email from one place to another.  What most people tend to NOT think about, especially over the course of TEN YEARS, is all of the little granular permissions, delegations, "EVP's admin asst can send email of his behalf, and view all calendars," etc etc etc.  We'll get to this in a later post, as it deserves its own.

To make things even a little more complicated, our host is still running Exchange 2003, and our current environment is Exchange 2007.  While there are defined upgrade paths from 2003 to 2007, there are not defined ways to take a 2003 edb and mount it on a 2007 mailbox server.  So, we were sold on the idea that tools would need to be leveraged, and trusts between forests would have to be established, in order to migrate the users, as if they were on a completely different mail platform altogether.

But wait...how the hell are we going to do all of that between a remote host and our internal domain?

Well, we could do a site-to-site tunnel, but that would require some complex networking and add add'l layers of complexity that we weren't interested in, or that the host might not even allow.

After exhausting all options, we resolved to the idea of physically relocating the Exch2k3 server, as well as a Domain Controller from that domain, into a private VLAN inside of our network, essentially hosting the additional domain short-term until we were able to migrate the data off of the mail server completely.  Why?  It seems to be much easier than trying to do complex tunneled solutions, the host was willing to let go of the old HP ML370 we're currently running on, and they were willing to replicate our domain information onto a DC that we could then bring in-house.

So, what's required to do this?

1) We need a private VLAN internally.  Coordination with the networking team to carve out ports and a space to host the new servers, and to avoid any crosstalk between the domains.  All outgoing mail would go out to the internet first and come right back in to the new 2007 server.  Could we get all super-cool with routing groups and SMTP connecters?  Sure, but why bother/overcomplicate for a server that has a remaining shelf-life of about a month.  We essentially just relocated the hosted solution, the same way they would if they moved datacenters.

2) Public access/interface:  MX records will have to be updated to our IP block, as well as building in new NAT/ACL rules into our firewalls for this solution.  Again, handled by the networking team but fairly straightforward, as if it were a new environment.

3) Physical move:  One of our admins is hauling a new box down in the morning that will be the new DC, and we will be taking an outage to relocate the servers.  During this time, the networking team will update Network Solutions, ZIX encryption gateway, and Sprint Spamshark to point to the new VLAN/Public IP.

At this point, we are at a hard cutover.  No more mail will flow to the host, even if we left the server there.  Once we plug in the hosts' gear and power it up in our datacenter, mail should resume delivery once again.


4) Power up the host's domain controller in our datacenter, and ensure that it is a GC.  Actually, this will be verified before it ever leaves the host's datacenter.

5) AD looks good?  Cool.  Check for EventID 13516 to ensure it can accept authentication, and upon success fire up the Exchange server.

That's the plan, and we'll see how it goes.

Upon success of this implementation/move, we'll take additional longer term outage over the weekend to attempt to P2V the box, and throw a ton of resources at it.  (It only has a 10/100 NIC, for example)

Further posts will include:

*Virtualizing Exch2k3 box with a P2V cold conversion.
*Virtualizing domain controller with a P2V cold conversion.
*Establish trusts between the two domains.
*Using 3rd party tools to move permissions, delegations, and data into existing domain.

Stay tuned...

Tuesday, November 23, 2010

NetApp SMO snapshots and changing heads...MESSY!

This post is going to be specific to you NetApp customers out there using your systems to host Oracle mount points or storage, but more specifically those using Snapmanager for Oracle to backup your Oracle databases (which funnily enough, doesn't pre-require you to be doing so on NetApp storage.)

As an aside, you should know that I'm on a cross country flight and making my first honest attempt at writing a post on the virtual keyboard of my iPad.

Some basic layout information of our infrastructure. Primary NetApp FAS3140 cluster hosting a dozen or so volumes with NFS exports mounted to HP DL380 servers over 10GbE. Pretty straight forward. This is hosting a single instance of Oracle 10g, and is being backed up using SMO 3.0.2. (3.1 is current)

As far as Oracle layout is concerned, I won't go into grit detail, but considering this post is about a fallacy I discovered in SMO, we need to establish a few things.

As is typical, Oracle uses a /u01, /u02, etc, format to number drives/mounts for it's structure. Honestly you can name them whatever you want, SMO just polls Oracle for what the mount points are. I'm just listing what we use in case I refer to them later in the post.

SMO is a great product regardless of whether you use NetApp for your storage or not. It was basically (unofficially) modified by Oracle themselves. Once the Oracle devs got their handson it they began collaborating with NetApp to fine-tune it.

Ok...now to the meat and potatoes of the post. As part of our resiliency, we keep a secondary archive log location active on a remote piece of storage that is also mounted over nfs. There is a defUlt setting in SMO that specifics to include the secondary arc log locations in any SMO snapshots.

Sounds great, doesn't it? Well, it is, unless you are unaware of it, or don't account for it in your capacity planning, or when replacing the hardware that hosts this secondary location.....which is what bit us in the backside this week.

We upgraded our off site hardware from 2050 controllers to 3140 controllers. Fairly routine. Snaps continued to run fine once the hostnames were updated and storage remounted. However, we started getting some log chatter in e SMO jobs about not being able to delete old snapshots that had expired (based on retention policy).

Hmm...after some head scratching and digging, we saw that it was trying to delete snapshots on the secondary arc log location of the old filer we just replaced. More head scratching and chatter over coffee, we hypothesized that somehow we were just going to have to ride out the log spam until the retention period had passed. We weren't really breaking anything. Or so we thought.

To compound things, further investigation reveals that not only is it not deleting the snapshots on e storage that doesn't exist anymore, it is also not flushing ....well, anything. This wS brought to our attention by the filling up of some volumes because the snap space was eating into the usuable space.

Feature request for NetApp: I want to be able to tell my snap products to NOT snap if there's no snap reserve space remaining. A snap backup job failing is not near as critical as a LUN filling up because snaps ate up all the usable space and crashed whatever application is being hostied and corrupting data. I know there is auto grow and auto delete snapshot abilities, and thats all fine and good, but you cant just delete snapshots randomly in something like an SMO snap, because the whole snap backup becomes invalid if it cannot find or access one sub-snapshot under the whole umbrella of a SMO snapshot.

So, by changing the hostname of the secondary storage, we nullified ALL SMO snaps that included a snapshot of that secondary arc log location. We couldn't delete them (gracefully) and we had to go through and force-delete the snaps, as well as traverse every single volume and remove the snapshots related to those jobs manually on all filers.

I did speak with one of the SMO gurus at NetApp this morning, and he confirmed this behavior, and also confirmed that there is a setting in SMO.config that can be changed to not snap the sec arc log location.

Once I get back from thanksgiving next week, I'll be posting the results of fixing this, with a thorough walkthrough.