virtualeverything

thoughts and musings regarding virtualizing IT

WTF (What The FEX) are you talking about?

Posted by veverything on November 10, 2011

FEX, or Fabric Extender technology is a core part of Cisco’s DC strategy. There are multiple marketing FEX terms that mean different things, and I’ve seen much confusion from customers & peers alike regarding these terms. There are four main FEX terms: ToR-FEX (also called “Rack-FEX”), Blade-FEX, Adapter-FEX and finally VM-FEX.

Before continuing, it would be helpful to get a background on what FEX actually is… read about FEX here.

ToR-FEX (“Rack-FEX”):

This describes utilizing Nexus 2K FEX at the top of each rack, connected to Nexus 5K/7K upstream. The server adapter port connects to the FEX and the port shows up on the upstream switch as if it was directly connected to it;  the FEX is a virtual line card in the switch, extending the fabric.

So, ToR-FEX/”Rack-FEX” = Nexus 5K/7K + Nexus 2K:

1 logical (VPC) link connects the 2K to the 5K, and each of the servers appear as if they are directly plugged in as veth interfaces are created for each of the physical adapters.

Blade-FEX:

In the UCS chassis, there are a pair of IOMs that handle the communication from the blades to the fabric interconnects; these IOM provide very similar FEX capability that is found in the Nexus 2K. The I/O flows from the blade mezz card, through the chassis backplane, to the IOM (FEX), and from there to the fabric interconnects at the top of the rack. Very similarly to the ToR-FEX/”Rack-FEX”, the IOM extends the fabric and the adapters on the blades show up on the fabric interconnects as vethernet interfaces, as if the IOM was a line card in the fabric interconnects themselves.

So Blade-FEX = UCS 6K (61xx/62xx) + UCS 2K:

A very similar logical diagram to the rack fex, except in a blade chassis. The UCS 2K is contained in the chassis, and the blades have a back plane connection to the FEX (IOM) instead of a wire. One logical (VPC) connection (supported with 62xx HW) extends the fabric up to the 6K, and logging into the 6K you can see the individual ethernet and fc interfaces of the blades.

Adapter-FEX:

The term Adapter-FEX is used to describe the the act of virtualizing a physical adapter on a server (blade or rack) and having those virtualized adapters appear to the upstream Nexus switch as if they are physically connected to it. Hence the “fabric extension” is happening from the adapter to the upstream switch, hence the term “Adapter-FEX”.

Now, there are two variants of Adapter-FEX — Adapter-FEX blade, and Adapter-FEX rack, applying to Cisco’s B-series (blade) and C-series (rack mount) servers equipped with the VIC:

So Adapter-FEX rack = VIC card + Nexus5K (one possibility, other combos are possible):

OR

Adapter-FEX Blade = VIC card + UCS2k + UCS6k:

The Adapter-FEX allows each server/blade to create multiple vNIC/vHBA and have them appear on the upstream device as if they are directly connected by showing up as veth or vfc devices.

 

VM-FEX:

VM-FEX is built on top of Adapter-FEX and is the ability to have control plane integration between the vSphere networking layer and the server networking What do we mean by that?

There are two types of virtual interfaces: static and dynamic. Static vNICs are what an vSphere administrator would create (for service console, vmotion, etc). But as virtual machines are created, a dynamic vNIC is also created by UCSM and associated with the proper port group. This vNIC also shows up in the upstream switch as if its directly connected. So each virtual machine has a vNIC which is created and shows up on the upstream device, just like if there were a physical server plugged into a physical port. It’s all about providing a unified methodology to managing virtual & physical assets.

So, VM-FEX = Adapter-FEX + vCenter networking control plane integration via UCSM.

In other words: VIC card + UCS2k + UCS 6k + vSphere integration via UCSM (blade). The key is UCSM talking to vCenter.

The above shows the VMFEX scenario for the blade, but the concept for rack servers is identical. There is control plane integration between UCSM and vCenter such that when a new VM is created a new veth (for each vNIC) is also created automatically on the upstream device, making it seem like the VM is connected physically to it. This is in addition to any virtual adapters at the hypervisor level (such as vHBA for storage, or static vNIC for hypervisor networking).

Note: as of UCSM 2.0, VM-FEX is also supported in KVM environments.

There is an analogous rack methodology, but I don’t see it used often, and have never actually seen it implemented. Most customers I see building large VMware environments are doing so with B-series.

As we go further down the virtualization journey, these control plane integrations will become more and more prevalent, and perhaps even table stakes at some point. We have, for example, storage plug-ins for vCenter, and vCenter “awareness” in some storage GUIs, but how about more direct control plane integration for “other” storage-ish? Things that make you go…. hmmmm.

* Note: diagrams are not necessarily physical representations of full deployment scenarios. In most cases, only half the picture is displayed, there would be a second 2K, second 5K, etc.

Posted in Cisco, FEX | Leave a Comment »

Peeling back the onion on HP-FEX

Posted by veverything on October 24, 2011

Recently, HP and Cisco in collaboration released a FEX module for the HP C7000 chassis. See here and here to read about the release from both HP and Cisco’s perspective. This post is not to discuss the business decisions behind this product release, but rather to take a closer look at the HP-FEX architecture from a technology perspective.

First off all, what the heck is a FEX? Read here  and here for some background on the term.

Now, with that out of the way, lets take a look at the networking architecture when deploying HP blade servers.

HP’s leading interconnect architecture is known as Virtual Connect FlexFabric. There are two main components to this:

  • server profile virtulization: Virtual Connect Service profiles allow one to take attributes of a server such as WWNS, MAC addresses, FC boot parameters, etc and store them as a software construct, thus making the hardware itself “stateless”. The Cisco UCS analog to this would be Service Profiles. For a deep dive into the differences, see here
  • virtualizing the 10Gb adapter port: allowing one to present up to 4x NICs to the host OS with traditional Flex10 or 3x NICs and 1x FCoE with FlexFabric interconnects. Cisco’s analog to this would be their “VIC” card which allows one to create up to 256 vNIC and vHBA and present them to the host. There are some technical differences between Flex-10 and Palo, but that is not the focus of this post either. Plenty of information out there on that subject easily available via Google.

First, lets take a look at what a HP BladeSystem architecture utilizing Virtual Connect FlexFabric architecture could look like:

The components here are 1x C7000 chassis with 16 blades utilizing FlexFabric interconnects and intgrated FlexFabric LOMs which give 2x 10Gb CNas per blade. The bottom most diagram represents a logical view from the OS perspective of a single blade. FlexFabric allows the administrator to divide a single 10Gbps CNA port into 4 devices: 3 NICs and 1HBA or 4 NICs. In this case, we have chosen 3 NICs and 1HBA to illustrate the FC/FCoE case. The operating system sees a total of 8 devices, 4 per CNA port. The OS communicates with the CNA as if it they were traditional NICs and HBAs. The FlexFabric LOM then combines these the NICs and HBAs into a FCoE stream and sends it through the midplane of the chassis up to the FlexFabric interconnects. The FlexFabric interconnects then split the FCoE traffic into their traditional Ethernet and Fiber Channel via seperate ports and send them upstream out of the chassis. In this case, a pair of Nexus 5Ks is used which has the ability to house both LAN and SAN ports. This Nexus switch could also uplink into a “core” LAN/SAN. Many architectures are possible upstream. Note that while the LAN connections are cross connected between switches, the SAN connections are *NOT*. This is because traditional fiber channel design relies on this “air-gapped” connectivity to maintain 2 separate fabrics.

Let’s contrast this with a HP BladeSystem deployment utilizing the B22HP-FEX:

This block diagram is very similar. The bottom most figure represents a logical view of a blade from an OS perspective. Unlike the FlexFabric configuration, when utilizing HP-FEX, the administrator does NOT have the option of creating 4 individual devices per CNA port. It defaults to a “regular” CNA adapter presenting one NIC and 1 HBA per port. The administrator will have to use other means of providing QoS since all the LAN traffic will travel through a single interface on the OS side. The classic example is creating multiple interfaces for VMware deployments — service console/VMotion, Production VM, backup etc. Another notable difference is the traffic is FCoE out of the chassis, where as in the FlexFabric design, it was getting broken out into its LAN/SAN counterparts. In this example I used the same number of ports for the upstream connectivity. The B22HP-FEX talks FCoE to the upstream 5Ks, which can then connect into “core” LAN/SAN infrastructures in larger deployments.

Notable differences between the architectures:

  • in the FlexFabric deployment, you have the option of creating up to 4 interfaces per CNA port. On the FEX design, you do not have this capability.
  • the service profile features offered by Virtual Connect is available in the FlexFabric deployment, but not in the B22HP-FEX deployment. This is a big deal since one of the major selling points to a HP BladeSystem is the ability to utilize Virtual Connect to abstract away the server hardware.
  • in the FlexFabric deployment, you have to decide up front how many Ethernet and Fiber Channel connections you want upstream of the chassis. In the FEX design, since the traffic leaving the chassis is FCoE, you do not have to make physical wiring changes in order to allocate LAN/SAN bandwidth — it can be done via SW in the upstream Nexus 5Ks
  • both the FlexFabric interconnects and B22HP-FEX offer 2:1 oversubscription — meaning there are 16 ports downstream, 1 per blade; and 8 ports up stream or .5 per blade. However the ability to utilize vPC in the FEX on all the links allows MUCH better utilization of the links. Because some (2) of the FlexFabric connections will be chewed up for chassis interconnects to create a single virtual connect domain, you actually have a higher (worse) over subscription ratio in the FlexFabric case.
  • from a points of management perspective, the B22HP-FEX interconnects are not managed individually. They act as remote line cards in the 5K (just like the standard Cisco 2000 series FEX). Each FlexFabric interconnect (pair) on the other hand is a point of management

The lack of blade profile virtualization is a MAJOR downside to utilizing the FEX in HP BladeSystem. I don’t think anyone will argue that the FEX based network architecture is cleaner and simpler ESPECIALLY at scale; but customers will have to choose between a superior network arcthiecture, or the benefits that come along with blade profile virtualization…. unless they decide to go with Cisco UCS, in which case they can have both. ;)
That being said, there are clear advantages and disadvantages to going with either design, so its going to be up to the customer to decide what is more important to them.

Posted in FEX, HP | 8 Comments »

Getting the VMware VSA running in a nested ESXi environment

Posted by veverything on August 17, 2011

In the previous VSA article we took at a look at the storage architecture of the appliance, as well as some of the caveats and considerations when deploying it. In this article, we’ll take a look at how to get it up and running in a nested ESXi environment as well as some of the functions the VSA provides.

First, in order to create a nested ESXi 5.0 environment, have a look at this great article.

When creating your environment, my recommendation is to create 4 individual vDS port groups or individual standard 4 vSwitches for the environment. You will assign each to a vNIC of the vESXi host to simulate connecting each pNIC to a physical switch in a real deployment.

Be sure to configure the vSwitches (or vDS port groups) with promiscuous mode enabled and create 2 vESXi VMs with 4 NICs minimum and a SINGLE VMFS volume (this is important or else the VSA will not install). I recommend a thin provisioned volume of about 200GB for testing.

You should end up with something like this:

Same applies if you are using standard vSwitches in your environment.

Now you need a Windows based vCenter 5.0 instance to manage this environment. Install the VSA manager software onto that vCenter which will then expose the VSA manager plug-in/tab on the vCenter client once you click on a vSphere data center:

In normal installations, you would then click on the VSA manager tab and follow the instructions to install. The problem is that since we are installing in a nested vESXi environment, “EVC” does not work with nested vESXi and is a requirement the installer checks for, thus you will not be able to proceed:

Thus far I have not been able to find a workaround for this for the GUI based install. However, after lots of lab time I found there is a way around this problem: in order to install the VSA in nested ESXi and bypass the EVC requirement, we need to tweak a configuration file and then do the installation via command line. Download the full zipfile which includes the command line installer if you haven’t already and unzip that onto your system.

Here is the minimum syntax to get it going:

install.exe -u root -p <password_to_ESX_hosts> -si <start_address_for_VSA_front_end_IPs> -nh

Recall that the VSA has a front-end network and a back-end network. The “-si” switch tells it what public IPs to use for the front-end. You can specify a “-bs” start range for the back-end IPs, but it will default to 192.168.0.1 as the start range if you do not specify anything. You can also specify netmasks and VLANs. See the manual for details.

The “-nh” tells it not to join the hosts into a high availability cluster and this will be important to help bypass the EVC check. If we execute this command this will be the result:

As you can see the automated command line installer runs an audit stage and it fails for the same EVC reason!

Well, after much lab time, I figured a way around this problem. We need to change a parameter in C:\Program Files\VMware\Infrastructure\tomcat\webapps\VSAManager\WEB-INF\classes\dev.properties. Search for this line:

evc.config=true

and change it to

evc.config=false

This will effectively bypass the audit check for EVC. Cool huh?

Now re-run the install.exe command, and it should complete:

And you end up with this in your nested ESXi environment:

The result is 2x 100GB data stores, which correlates with each VSA having 200GB of RAW storage, for a total of 400GB RAW or 200GB usable after RAID10 internal to the VSA.

 

Here is a peak at the networking the VSA installer sets up:

There are front-end and back-end port groups that live on separate vSwitches and pNICs. You are now free to customize the networking however you see fit, but it HAS to have a default configuration starting out or else the install WILL fail.

Now that the VSA is installed, you can continue to manage it through the VSA plug-in in vCenter. We only needed to do the hack and command line to get it up and running. Again, its important to note this would not be required in a real installation, it was required due to the limitations of nested ESXi.

End result:

In the next article, we’ll take a look at some administrative tasks, and testing out some of the failure scenarios and how the VSA handles it from a downtime/uptime/reliability perspective.

Posted in storage, storage virtualization, vmware | 5 Comments »

A closer look at VMware’s Virtual Storage Appliance 1.0 (VSA)

Posted by veverything on August 15, 2011

One of the new products which accompanies the vSphere 5.0 release is the Virtual Storage Appliance. The purpose of this product is to allow customers to utilize the local disks on the ESXi hosts in order to create a shared storage environment for their virtual infrastructure, thus being able to take advantage of the advanced features such as HA and VMotion which are reliant on shared storage. The idea behind this is to avoid the costs of a hardware based SAN/NAS system to allow SMB customers to implement vSphere and its advanced features at a more attractive price point.

VSA Cluster Storage Architecture:

vsa_arch

vsa_arch

The VSA cluster is two (or three) VMs that run in the ESXi environment. Depicted above is the architecture from a storage perspective, and its important to understand the levels of abstraction and how we finally arrive at a shared storage resource.

The the very bottom of the stack, is the physical ESXi host (physical server) which houses the local hard disks. Presumably, there is some kind of hardware RAID capability in this server either as a function of the BIOS or a RAID card which takes all the disks and combines them together using RAID protection to give a local volume. VMware says that RAID10 is a requirement here, but this is not a hard and fast requirement as far as I can tell — more on that below.

You then install ESXi onto this local volume and by doing so format it with the VMFS file system. When you install the VSA, the installer uses the remainder of the disk space not taken up by the ESXI install and the VSA VM itself for the “shared storage” capacity and presents that to the VSA VM as series of VMDKs which the VSA VM combines using a LVM to form a primary & secondary volume. The VSA VM runs an NFS server and exports this volume back to the ESXi host. Each VSA VM does this, and hence you end up with 2 NFS volumes (in a 2-node cluster): VSADs-1 and VSADs-2. Just a little bit of inception going on here! :) Its important to note that only half he space is actually exported as a NFS volume due to RAID10 protection.

To elaborate a little on the the primary and secondary volumes in the VSA VM — remember that each volume exported by the VSA VM is protected via RAID10. So one half of the VSADs-1 RAID10 mirror lives on VSA1 and the other half lives on VSA2. In this way, the environment can tolerate disk failure as well as node failure and still remain operational thanks to the RAID10 protection. What I haven’t been able to dig into yet is the replication mechanism for keeping the primary/secondary volumes in sync. I suspect it might be something like DRBD (not verified, just a guess).

Now that we better understand how the VSA works under the covers, its important to note that there are a number of considerations and caveats to be aware of when deciding to utilize the VSA:

  • the VSA manager (server side of the plug-in which allows you to manage the VSA) needs to be installed on a Windows based vCenter server. this means you cannot utilize the vCenter Appliance (VCA) as it is Linux based. To me this is definitely a downside as the VCA is extremely easy to setup (done via OVF and can be up and managing an environment in minutes) and perfectly suited to SMB environments with its internal database. Hopefully this can be addressed in future releases. Looking at the VSA manager, it looks to be all tomcat/java based, so there is no reason it cannot run on a the Linux based VCA
  • when setting up the VSA, each ESXi host must be a fresh install with no virtual machines running on it. further more, each ESXi host must have only the default vSphere standard switches or port groups. you cannot create any additional switches or port groups. once the VSA has been setup, you are then free to modify the networking
  • the ESXi hosts must be on the same subnet as the vCenter server
  • the ESXi hosts must not be in another HA cluster. the VSA setup utility sets up its own HA cluster for the environment
  • maximum supported hard disk capacity per ESXi host is 64TB
  • there are specific requirements around networking: each ESXi host requires 4 NIC ports minimum, and you require 2 VLANs (one for front-end and one for back-end traffic)
  • 72GB of RAM is the maximum supported & tested RAM configuration with the VSA
  • memory overcommit on VMs is not supported when utilizing the VSA. VMware’s reason for this is because if swapping occurs, there could be severe performance slow down. I don’t necessarily agree with this, as if you put enough spindles in the local host, it should not be an issue. But again this is VMware’s official support statement.
  • VMware says you should have 8 or more hard disks in RAID10 in the ESXi hosts. I see no reason why you could not utilize RAID5 or a different hard disk count. In fact, in my testing, I did not utilize any “local RAID” per se as I was running in a nested ESXi environment, and the actual LUN was utilizing RAID5 on the back-end in a FAST-VP pool. I suspect that VMware recommends a minimum of 8-disk RAID10 on the hard disks for performance reasons. But there is no reason why you wouldn’t treat spindle count on the ESXi hosts’ local drives just like you would for sizing a SAN LUN for traditional environments. Not enough spindles = performance issues no matter if they are local disks or SAN disks. But again, this is VMware’s official support statement requiring RAID10 and a minimum of 8 disks.
  • the VSA mirrors the data utilizing RAID10 (a primary and replica volume each on different hosts). this is not configurable, so plan on this from a capacity perspective. If you have 8 disks in your ESXi host doing a RAID10 giving you a volume of 1TB, and you have 2 hosts for a total of 2TB — you will end up with 1TB of usable capacity in your environment. in VSA1 500GB will be primary, 500GB will be secondary, and similarly for VSA2.
  • the VSA exports the volumes as NFS; there is no support for iSCSI
  • if you are running vCenter as VM, it CANNOT be running on the hosts participating in the VSA cluster

Next, we will look at how to get the VSA up and running in a nested ESXi environment and following that some general tasks and see what is/is not possible with the VSA compared to traditional shared storage as well as how it handles some failure scenarios.

Posted in storage, vmware | 3 Comments »

Simplifying SAN management for VMware Boot from SAN, utilizing Cisco UCS and Palo

Posted by veverything on May 31, 2011

One of the great features of the Cisco UCS is the Palo or Virtual Interface Card (VIC). When utilizing this card with UCS, it allows the administrator to create many virtual NICs (vNICs) and virtual HBAs (vHBAs) (up to 128 with some limitations). In a VMware environment, the use of vNICs is well understood — you can create individual vNICs for service console, vMotion, VM network traffic, IP storage traffic, and so on. You can then apply QoS policies to them to guarantee service levels. Additionally, you have the ability to utilize dynamic vNICs and Pass-Through-Switching which bypasses VMware’s vSwitch and dynamically assigns vNICs to VMs as they are created. The benefits to creating vNICs is clear, but how about vHBAs?

At first glance, it doesn’t seem that useful to create more than 2 vHBAs (one per SAN fabric); and after all this is something that you can do with the standard UCS mezzanine cards from Qlogic and Emulex. There is one use case where the ability to create more than two vHBAs comes in handy — that is boot from SAN in VMware environments. This applies equally to boot from SAN servers in other clustered environments, but I will be using VMware to illustrate this design option, with EMC’s midrange Clariion/VNX storage.

Read the rest of this entry »

Posted in Cisco, EMC, storage, UCS | 3 Comments »

FCoE’s impact on a Storage Administrator

Posted by veverything on May 30, 2011

As FCoE is gaining more traction and moving from a “vision” to a real consideration for many customers, one of the most common question I get from CxOs is: “I understand the benefits of FCoE in my datacenter, but how it will impact my storage team? Will they need to invest significant amounts of time  new methodologies, commands, concepts, etc when administering the storage network?”

Read the rest of this entry »

Posted in Cisco, FC, FCoE | 1 Comment »

VMAX on a Clariion Planet, Part2: storage layout and provisioning

Posted by veverything on April 27, 2011

In part2 of this series, we’ll take a look at the storage layout and provisioning basics comparison between VMAX and Clariion.

First a look at how storage is composed on the two arrays.

Read the rest of this entry »

Posted in EMC, storage, VMAX | Leave a Comment »

VMAX on a Clariion Planet, Part1: A look at architecture and IO flows

Posted by veverything on April 25, 2011

This article is focuses on understanding VMAX from the perspective of users who are familiar with Clariion arrays, terminology and architecture. Put another way, a guide to VMAX for Clariion users. We’ll take a look at the architecture similarities/differences, terminology and a look at basic storage administrative tasks. When Clariion is mentioned in this article, it applies equally to VNX arrays as well, as they are similar for the purposes of this article.

Part1 will focus on architecture and IO flows, and Part2 will discuss some storage design and provisioning concepts.

With that said, lets examine I/O flow from the host to a back-end disk of each array type.

Read the rest of this entry »

Posted in EMC, storage, VMAX | 3 Comments »

EMC Storage Pool Deep Dive: Design Considerations & Caveats

Posted by veverything on March 5, 2011

This has been a common topic of discussion with my customers and peers for some time. Proper design information has been scarce at best, and some of these details appear to not be well known or understood, so I thought I would conduct my own research and share.

Some time ago, EMC introduced the concept of Virtual Provisioning and Storage Pools in their Clariion line of arrays. The main idea for doing this is to make management for the storage admin simple. The traditional method of managing storage is to take an array full of disks, create discrete RAID groups with a set of disks, and then carve LUNs out of those RAID groups and assign them to hosts. An array could have dozens to hundreds of RAID groups depending on its size, and often times this would result in stranded islands of storage in these RAID groups. Some of this could be alleviated by properly planning the layout of the storage array to avoid the wasted space, but the problem is that for most customers, their storage requirements change and they very rarely can plan how to lay out an entire array on day 1. There was a need for flexible and easy storage management, and hence the concept of Storage Pools was born.

Storage pools, as the name implies, allows the storage admin to create “pools” of storage. You could even in some cases, create one big pool with all of the disks in the array which could greatly simplify the management. No more stranded space, no more deep architectural design into RAID group size, layout, etc. Along with this comes a complimentary technology called FAST VP, which allows you to place multiple disk-tiers into a storage pool, and allow the array to move the data blocks to the appropriate tier as needed based on performance needs. Simply assign storage from this pool as needed, in a dynamic, flexible fashion, and let the array handle the rest via auto tiering. Sounds great right? Well, that’s what the marketing says anyway. :)

First let’s take a brief look at the difference between the traditional RAID group based architecture and Storage Pools.

Read the rest of this entry »

Posted in EMC, storage | 28 Comments »

SSDs, TRIM and performance drop offs in the real world

Posted by veverything on January 17, 2011

SSDs can dramatically increase the performance in personal computers.

Roughly 9months I installed a Crucial 128GB SSD into my Macbook (Model: CT128M225). It made an enormous difference with my user experience as I generally tend to run a lot of applications, VMs, etc on my laptop. The wait times for applications loading and disk I/O happening were reduced dramatically — no real surprise.

However, as time went on, I noticed that the performance, while still good, felt like it was dropping off. Since I performed some benchmarking when I first installed the SSD, it was easy to verify.

Read the rest of this entry »

Posted in general | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 29 other followers