Understanding HP Flex-10 Mappings with VMware ESX/vSphere

4 11 2009

This article is moved towards my new site:

http://virtualkenneth.com/2009/11/04/understanding-hp-flex-10-mappings-with-vmware/

Advertisements

Actions

Information

30 responses

4 11 2009
Kevin

Great job in simplifying the Flex10. I’ve got you linked on my site (BladesMadeSimple.com). I’m curious about a couple of things:

a) how is the performance of 10Gb Ethernet – especially as carved up by Flex10?
b) what value is it to connect to the Cisco Nexus 5000 if the Flex10 NICs only run at 10Gb Ethernet (lossy) speeds?

5 11 2009
Kenneth van Ditmarsch

Hi Kevin,

Thanks and thanks for your comment ๐Ÿ™‚

Your first question about the speed, we ran some 100% sequential reads with
4 KB block sizes to max out the performance towards the LeftHand Nodes with 220 MB/s. Our current bottleneck are the storage nodes which are unfortunately running on 2x 1Gb connection. So in current setup we cannot max out the complete bandwidth.

Concerning the second question, the Cisco Nexus 5000 was one of the things that was already available for us to connect to.

5 11 2009
Duncan

Maybe I don’t understand the concept, but what happens if mezanine slot 2 dies or the onboard module dies? wouldn’t you have an issue because there’s no redundancy?

5 11 2009
Kenneth van Ditmarsch

Hi Duncan,

Completely correct, they both are a SPOF which has been explicitly choosen for by the customer.
The reason for that was that the first design drawings were based on the HP BL460 G1 which only had 10GB on the mezzanine card (thus leaving us with no other option than to team on 1 dual-port NIC)

Since this drawing was presented to all the technical people (and it took them a long time to understand the Blade technology’s), customer constraints demanded that we teamed the same way purely for the technical understanding.
In this case it’s rather simple that Interconnect Bay 1 is redundant to Interconnect Bay 2 (horizontal) and IB3 is redundant to IB4 etc.

So yes, I would have designed it otherwise but constraints keep me from doing so. Good point though since I didn’t mentioned this in the Blog.
(actually my only target was to define the port mappings but in my enthusiasm the article kept on growing ๐Ÿ˜‰

Kenneth

5 11 2009
SpaceDeep

I agree with you Duncan. This configuration is not really good. If mezzanine or onboard slot dies you will lose connectivity. I cannot see any redundancy in this concept. My opinion is that you should create same configuration on both slots (mezzanine and onboard). And there is another mystery for me. Why should we use 7Gb connection for VMotion?
My idea for this design is:
Onboard
Port 1-1 SC net 1 Gb
Port1-2 VMotion 1 Gb
Port1-3 FT 2 Gb
Port1-4 Other Networks 6 Gb
Port 2-1 Storege 10 Gb
Mezz 2:
Port 1-1 SC net 1 Gb
Port1-2 VMotion 1 Gb
Port1-3 FT 2 Gb
Port1-4 Other Networks 6 Gb
Port 2-1 Storege 10 Gb
What do you think about this conf Duncan?

5 11 2009
Kenneth van Ditmarsch

Correct, but customer constraints kept me from doing so.
I will note this in the blog ๐Ÿ™‚

BTW, concerning the VMotion. We are running VM’s with 16 GB memory which we wanted to transfer as fast as possible.
Network statistics showed that the VM’s aren’t running high on networking. Whenever in the future we need to shuffle with the speeds this can be done dynamically (which was one of the customer constraints as well)

5 11 2009
Frank Denneman

Spacedeep,

The article written by Kenneth is not a blueprint for every flex10 environment.
He just describes this particular design. You are free to use the standard 1GB linespeed for VMotion.

Good article Kenneth!
It’s nice to see an article that shows that often office politics and conflicting agenda’s between server and network tribes can lead to a suboptimal design.

Being a good designer sometimes means you must swallow your pride and design a environment that aligns with the needs and requirements of the customer, instead of an optimal technical design. Explain the issues, document your objection and deliver the design if your objections are waived by the customer.

5 11 2009
SpaceDeep - Albin Penic

Frank as Duncan says. Don’t get me wrong. Description of Flex-10 is really nice. I just post mine opinion about network design in virtual environment. As you can see in previous comment even Kenneth did agree with me and Duncan that configuration can be an issue. Frank we all try to share our knowledge for good of everyone.

5 11 2009
Duncan

And don’t get me wrong, it’s an excellent article!! I love the diagrams and the way you describe the concept.

6 11 2009
Bill

Thank you. For someone new to Flex-10, blades, and VMware, your diagrams have been very helpful.

10 11 2009
Didas

Hi,

did you get the issues with the VC Modules 5 and 6 resolved? We had something similar with other Modules and it with a problem with the nic (in this case a normal 1 GB Dualport from Intel).

Just wonder if there is a bigger underlying issue here.

Marcus Breiden

10 11 2009
Kenneth van Ditmarsch

Hi Marcus,

No unfortunately not. I did a complete update on the enclosure (taking the HP compatibility matrix into account) and VCM (2.30) but this unfortunatly didn’t solve the problem.
I’ve requested time to install Windows on a BL460 G6 and configure network the same as under ESX, that way I can tell if this is a Hardware problem of a VMware problem.

I’m only seeing this behavior on G6 blades, our G1 blades are working correctly with Windows/VMware

10 11 2009
Marcus Breiden

Hmm… we did see the same issue in our case in Windows, disabling the network card and enabling it again did solve the issue.

So let me explain our issue a little and you can compare it:

We did implement C7000 with BL460c with and 6 VC with 2 CX2 Uplinks Modules as Module 1,2,5,6,7 and 8. I will have to look up the correct Modellnumbers. The blades where using the Intel Quadport Nics

In VCM Module 5 and 6 didn’t show any Server Profiles, Module 7 and 8 did show the Profiles correctly.

If you disabled one of the Modules (5-8) and did enable it again one of the servers (windows or vmware) would get an error, most of the time the same server with THAT module. The link of the nic would go up, but the connection wouldn’t work.

VCM was showing that the link was not connected but the Link of the Nic was up in the OS.

Only disabling the Nic and enabling it again (in windows) or unloading the nic driver in esx would solve the issue, or a reboot ofcourse.

We are now using Broadcom Quad Nics (NC325m) and the issue is gone.

It did take us quite some while to troubleshoot because of the strange symptons till we figured out what it was. Even HP and VMware weren’t able to troubleshoot it at the beginning, only one of the Senior Support guys of HP who was onsite did help us figure the issue out and solve it.

So is that similar to what you are seeing or something totally different. You didn’t describe the problem to well ๐Ÿ™‚

Marcus Breiden

12 11 2009
Kenneth van Ditmarsch

Hi Marcus,

Let me see if I can get this any clearer:
– HP BL 460 G6 with vSphere
– Interconnect 1 to 6 are equipped with modules (Interconnect 1, 2, 5 and 6 contain a Flex-10 Module)

Whenever I powerdown IC5 or IC6 then obvioulsy the downlinks towards mezzanine slot 2 will disappear (visible from the OS as a failing NIC connection.)

So, we have these BL460 G6’s installed with vSphere. vSphere detects (and logs) that the vmnic connection is failing. Whenever we power on back the IC module (5 of 6) vSphere keeps on logging that the connection is failing.
This behavior does not occur when we for instance powerdown IC1 of 2 (while the modules are the same)

To clarify if this is a HP problem or a VMware problem I just installed Windows 2008 on the BL460 G6 and kept the VC Profile exactly the same. Powering down IC 5 (or 6) causes a link failure and powering on causes the link to be restored (as it should be) within Windows 2008. So for me it’s clear that I need to register a support call with VMware for this issue.

12 11 2009
Marcus Breiden

Hi Kenneth,

yeah I totally agree, but this is also noteworthy and important. THanks for clarifying the issue.

The Links in VCM show that they are linked I would assume. Did you try unloading and loading the drivers in esx?

12 11 2009
Kenneth van Ditmarsch

Yes, VCM is linked. I’ve only tried to restart the network stack but that didn’t solve my problem.
Going to make a call now.

17 11 2009
Michael J

Hi Kenneth,

Just wanted to say thanks for the valuable info; especially around the words of advice. Im experiencing the same issues you mention in all 3 comments, so its great to see there is some light at the end of the tunnel. Thanks also for the detailed information on Flex 10 and explanation of how it works. I have one question regarding your network environment. Im assuming that the Cisco’s you are referring to are 6500 series? Jsut wanted to clarify that.

thanks,

Michael.

17 11 2009
Kenneth van Ditmarsch

Hi Michael,

No thanks ๐Ÿ™‚ Correct, i’m referring to 6500 series. Are you also experiencing “the link that doesn’t come back” from the Interconnect modules?

Cheers,
Kenneth van Ditmarsch

17 11 2009
Michael J

Hi Kenneth,

THanks for that. We arent experiencing that exact issue where the link doesnt come back from the interconnect, however we have experienced the issue where: “Whenever a Virtual Connect Module fails, the downlinks towards ESX will fail as well (since these are hardwired via c7000 Backplane).”

We found that if you reset the module, it all comes back up and connectivity is restored, however after a period of about 3 hours with both vc modules powered on, it fails again and eventually all of the blades within the chassis (including the ESX hosts) lose network connectivity. We then have had to power off one of the vc modules and all connectivity is restored, which is why I am interested in the posts you about portfast settings on the Cisco 6509’s. Have also been advised to enable Smartlink within virtual connect and also to enable LACP on the Cisco’s as per: http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00865618/c00865618.pdf (pg49)

We are currently having this issue in our Production environment and have resorted (at this point) to using on flex 10 vc module until we ascertain the root cause. ANy help would be appreciated!

cheers,

Michael.

17 11 2009
Kenneth van Ditmarsch

Rather strange that after 3 hours both Interconnect Modules fail. Doesn’t sound like a Spanning Tree issue to me than ๐Ÿ˜‰
Which FW are you running?
I noticed some remarks on “dropping connections” within FW 2.12 and 2.30 (last week I upgraded towards 2.30 and yesterday I saw that they already released 2.31…)

In my design we explicitly don’t use smart links since losing a FC uplink causes the VC modules to failover on hardware level (so my ESX can keep his both uplinks)

17 11 2009
Michael J

Sorry Kenneth, to clarify its not that both vc’s fail, it seems that the active vc module has a failure of some description, but the standby vc module doesnt see a failure and therefore doesnt take over, so the active vc module seems to be “half failed”. However we do see that it does fail completely over that 3 hour period, which as you said is strange. We are running firmware v 2.12 and like you I noticed HP released 2.31 which we have yet to upgrade to, so that could be possible as well. THanks for clarifying regarding smartlink ๐Ÿ™‚

19 11 2009
Michael J

Hi Kenneth,

An update on my situation. This was escalated to HP and as it turns out we resolved it ourselves before HP had even done their diagnosis. We upgraded the f/w to 2.31 and it resolved the issue. However, it wasnt the firmware itself that fixed the issue; rather it seemed that when we upgraded to 2.12, despite the vc firmware update CLI utility reporting that the update was successful, it appears that the image that was copied to the vc module was incomplete / corrupted. So when we upgraded again to 2.31, the image upload was ok. Might be worthwhile doing the MD5 check on the image file when downloading from HP. Its fixed but just an awareness for everyone ๐Ÿ™‚

17 11 2009
Michael J

Strike that…. just realised it does say 6509. Read the fine print Michael….

27 11 2009
Kenneth van Ditmarsch

As an update on my own troubleshooting; VMware has handed me a new Broadcom NIC driver which I will test in the environment next week.

3 12 2009
Ron

I think people are making too much of potential failures in the mezz2 flex 10. Failures rarely happen in our current gen HP hardware and if they do they happen sooner than later due to bad production parts. Besides if you’re like us you have these hosts sitting in large HA clusters thus the reason for paying all that money.

4 12 2009
Testing Scenario’s VMware / HP c-Class Infrastructure « VirtualKenneth's Blog

[…] Scenario’s VMware / HP c-Class Infrastructure 4 12 2009 Since my blog about Understanding HP Flex-10 Mappings with VMwareย ESX/vSphere is quite a big hit (seeing the page views per day) I decided to also write about the testing […]

24 12 2009
Unresponsive HP Virtual Connect Manager – vcutil « VirtualKenneth's Blog

[…] HPโ€™s instructions (credits to the HP Technical Consultant who I got linked to viaย my blog about Understanding HP Flex-10) I download the Virtual Connect Support Utility, which is described by HP like: โ€œThis utility […]

19 01 2010
Richard Boswell

Michael J,

It’s possible your issue wasn’t fully firmware-based. We have had similar issues but reloading FW didn’t fix it. The VC and OA modules are based off of BusyBox using a custom version of Linux with two primary modules built by HP called VCETH and VCM. VCETH is the low-level module that provides L1/L2 services, whereas VCM provides L3 and mgmt functionality. We had “soft-failures” like you mention, where throughput isn’t stopped but slowly withers away but we are unable to manage/connect to the VC modules either by HTTPS or SSH. We have been able to connect via the serial interface, restart the VcM domain that way, and regain mgmt functionality. Kinda strange but we have monitors set up now to alert on it.

We noticed this trend on several different FW revisions (2.10, 2.12, and 2.31).

4 03 2010
Blades Made Simple · Virtual I/O on IBM BladeCenter (IBM Virtual Fabric Adapter by Emulex)

[…] NICs”. HP’s been doing this for a long time with their FlexNICs (check out VirtualKennth’s blog for a great detail on this technology) so I didn’t see the value in what IBM and Emulex was […]

4 03 2010
Virtual I/O on IBM BladeCenter (IBM Virtual Fabric Adapter by Emulex) « BladesMadeSimple.com (MIRROR SITE)

[…] NICs”. HP’s been doing this for a long time with their FlexNICs (check out VirtualKennth’s blog for a great detail on this technology) so I didn’t see the value in what IBM and Emulex was […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: