At my day job, we have recently started rolling out NexentaStor 3 for our VM image storage as a trial. If all goes well, our long-term plans are to eventually migrate all storage from Netapp to NexentaStor. As we started rolling out our NexentaStor trial, one missing feature we quickly ran across is the lack of IPMP (IP Multipathing) support. The network configuration interface that they provide can currently configure aggregated interfaces with the LACP protocol, but they do not have any mechanism to configure IPMP to aggregate interfaces from multiple switches. We were able to work out an approach to configure IPMP manually, and received Nexenta’s blessing to use it in our environment. (Important note: if you are going to try this on a licensed copy of NexentaStor, please check with your support team to ensure that they are ok with you making these changes.)
[ad name=”Google Adsense 728×90″]
Updated 2010-Sep-16 — Added information on how to add static routes to configure ipmp ping targets
Server hardware configuration
First of all, I should detail what we are trying to configure. Our production machines are quite similar to the SuperMicro build I documented earlier, with a few varying specs. Here’s what is in it:
- 2x Intel E5620 CPUs
- 48G memory (6x 8G modules)
- 10x 1TB Seagate Constellation ES SAS drives
- 6x Intel 32gb X25-E drives
- 2x Crucial 128gb RealSSD drives
- 2x Intel Gigabit ET quad-port NICs
The machine has an 8TB license, with 2 of the disks configured as hot spares. The Intel SSDs are configured as 3 log-mirrors, and the RealSSDs are configured as cache devices.
Caveats
The only major caveat that I’ve hit with this configuration is that the ipmp interfaces will not be viewable via the Nexenta utilities. You can still see all the underlying interfaces; just not the ipmp ones. It’s mostly cosmetic, but is distracting and annoying.
Of course, YMMV – this worked for me, but no guarantees that it will work for you! :)
Network configuration
Desired configuration
Here is the network configuration we desire:
- LACP Aggregate #1 – 4 gigabit links to Switch-1
- LACP Aggregate #2 – 4 gigabit links to Switch-2
- IPMP Interface #1 (balancing LACP1 and LACP2) – Native VLAN on NICs; access to our management/backend network
- IPMP Interface #2 (balancing LACP1 and LACP2) – VLAN ID 100; VM storage network
- IPMP Interface #3 (balancing LACP1 and LACP2) – VLAN ID 200; NFS storage network
General configuration steps
As far as I know, you cannot do VLAN tagging on top of an IPMP trunk on Solaris, which means that we need to create the VLAN interfaces on each of the aggregate interfaces, and then create three separate IPMP interfaces – one per vlan. Here are the basic configuration steps:
- Via NMC: Create individual LACP aggregates (aggr1 and aggr2), with igbX interfaces as members.
- Via NMC: Create VLAN interfaces ‘100’ and ‘200’ on top of both ‘aggr1’ and ‘aggr2’. This will create the interfaces ‘aggr100001’, ‘aggr100002’, ‘aggr200001’, and ‘aggr200002’.
- Via NMC: Configure an IP address from within each VLAN on each of these six interfaces. This will allow IPMP to use ICMP probes in addition to link detection to try to find failed links.
- Via the console: Configure the three IPMP interfaces, and add the six aggr interfaces to the proper IPMP groups.
NMC – Create LACP aggregates
This assumes that whatever interface you configured during installation is *not* one of the interfaces you desire to be part of the aggregate. If that is not true, you will need to be on the system console (via IPMI hopefully!), and destroy that interface first. Here is an example of how to create the aggregates (output from NMC; so this is how it ends up being configured):
nmc@nexenta:/$ setup network aggregation create Links to aggregate : igb2,igb3,igb4,igb5 LACP mode : active LINK POLICY ADDRPOLICY LACPACTIVITY LACPTIMER FLAGS aggr1 L3,L4 auto active short ----- nmc@nexenta:/$ setup network aggregation create Links to aggregate : igb6,igb7,igb8,igb9 LACP mode : active LINK POLICY ADDRPOLICY LACPACTIVITY LACPTIMER FLAGS aggr2 L3,L4 auto active short -----
NMC – Create vlan interfaces on each aggregate interface
nmc@nexenta:/$ setup network interface aggr1 vlan create VLAN Id : 100 aggr100001: flags=201000842 mtu 9000 index 39 inet 0.0.0.0 netmask 0 ether 0:1b:21:6c:3c:de nmc@nexenta:/$ setup network interface aggr2 vlan create VLAN Id : 100 aggr100002: flags=201000842 mtu 9000 index 40 inet 0.0.0.0 netmask 0 ether 0:1b:21:6c:3d:de nmc@nexenta:/$ setup network interface aggr1 vlan create VLAN Id : 200 aggr200001: flags=201000842 mtu 9000 index 41 inet 0.0.0.0 netmask 0 ether 0:1b:21:6c:3e:de nmc@nexenta:/$ setup network interface aggr2 vlan create VLAN Id : 200 aggr200002: flags=201000842 mtu 9000 index 42 inet 0.0.0.0 netmask 0 ether 0:1b:21:6c:3f:de
NMC – Assign IP addresses
This assumes the following IP ranges:
Native VLAN: 10.100.0.0/24
VLAN 100: 10.100.100.0/24
VLAN 200: 10.100.200.0/24
It also assumes that aggregate 1 should be assigned .2 within each /24, and aggregate 2 should be assigned .3. The ipmp shared interface will be assigned .1.
nmc@nexenta:/$ setup network interface vlan aggr1 static aggr1 IP address: 10.100.0.2 aggr1 netmask : 255.255.255.0 Name Server #1 : 10.0.0.101 Name Server #2 : 10.0.0.102 Name Server #3 : Gateway IP address : 172.16.0.254 Enabling aggr1 as 10.100.0.2/255.255.255.0 ... OK. nmc@nexenta:/$ setup network interface vlan aggr2 static aggr2 IP address: 10.100.0.3 aggr2 netmask : 255.255.255.0 Name Server #1 : 10.0.0.101 Name Server #2 : 10.0.0.102 Name Server #3 : Gateway IP address : 172.16.0.254 Enabling aggr2 as 10.100.0.3/255.255.255.0 ... OK. nmc@nexenta:/$ setup network interface vlan aggr100001 static aggr100001 IP address: 10.100.100.2 aggr100001 netmask : 255.255.255.0 Name Server #1 : 10.0.0.101 Name Server #2 : 10.0.0.102 Name Server #3 : Gateway IP address : 172.16.0.254 Enabling aggr100001 as 10.100.100.2/255.255.255.0 ... OK. nmc@nexenta:/$ setup network interface vlan aggr100002 static aggr100002 IP address: 10.100.100.3 aggr100002 netmask : 255.255.255.0 Name Server #1 : 10.0.0.101 Name Server #2 : 10.0.0.102 Name Server #3 : Gateway IP address : 172.16.0.254 Enabling aggr100002 as 10.100.100.3/255.255.255.0 ... OK. nmc@nexenta:/$ setup network interface vlan aggr200001 static aggr200001 IP address: 10.100.200.2 aggr200001 netmask : 255.255.255.0 Name Server #1 : 10.0.0.101 Name Server #2 : 10.0.0.102 Name Server #3 : Gateway IP address : 172.16.0.254 Enabling aggr200001 as 10.100.200.2/255.255.255.0 ... OK. nmc@nexenta:/$ setup network interface vlan aggr200002 static aggr200002 IP address: 10.100.200.3 aggr200002 netmask : 255.255.255.0 Name Server #1 : 10.0.0.101 Name Server #2 : 10.0.0.102 Name Server #3 : Gateway IP address : 172.16.0.254 Enabling aggr200002 as 10.100.200.3/255.255.255.0 ... OK.
Console – Set up IPMP
First, we need to get into expert mode.
nmc@nexenta:/$ options expert_mode=1 nmc@nexenta:/$ !bash You are about to enter the Unix ("raw") shell and execute low-level Unix command(s). CAUTION: NexentaStor appliance is not a general purpose operating system: managing the appliance via Unix shell is NOT recommended. This management console (NMC) is the command-line interface (CLI) of the appliance, specifically designed for all command-line interactions. Using Unix shell without authorization of your support provider may not be supported and MAY VOID your license agreement. To display the agreement, please use 'show appliance license agreement'. Proceed anyway? (type No to return to the management console) Yes root@nexenta:/volumes#
[ad name=”Google Adsense 728×90″]
Next step is to set up the hostname files for each of the IPMP interfaces. I will name the interfaces as follows:
ipmp0 – ipmp interface for aggr1 and aggr2
ipmp100000 – ipmp interface for aggr100001 and aggr100002
ipmp200000 – ipmp interface for aggr200001 and aggr200002
These files also set the IP address that we would like the system to apply to these.
root@nexenta:/etc# cat hostname.ipmp0 ipmp group ipmp0 10.100.0.1 netmask 255.255.255.0 up root@nexenta:/etc# cat hostname.ipmp100000 ipmp group ipmp100000 10.100.100.1 netmask 255.255.255.0 up root@nexenta:/etc# cat hostname.ipmp200000 ipmp group ipmp200000 10.100.200.1 netmask 255.255.255.0 up
Next, the groups need to be configured in the hostname. files. We need to add ‘group -failover up’ to each of these files, before the ‘up’ at the end. Here are the files after the changes are made:
root@nexenta:/etc# for i in /etc/hostname.aggr* ; do echo $i ; cat $i ; done /etc/hostname.aggr1 10.100.0.2 netmask 255.255.255.0 mtu 9000 broadcast + group ipmp0 -failover up /etc/hostname.aggr2 10.100.0.3 netmask 255.255.255.0 mtu 9000 broadcast + group ipmp0 -failover up /etc/hostname.aggr100001 10.100.100.2 netmask 255.255.255.0 broadcast + group ipmp100000 -failover up /etc/hostname.aggr100002 10.100.100.3 netmask 255.255.255.0 broadcast + group ipmp100000 -failover up /etc/hostname.aggr200001 10.100.200.2 netmask 255.255.255.0 broadcast + group ipmp200000 -failover up /etc/hostname.aggr200002 10.100.200.3 netmask 255.255.255.0 broadcast + group ipmp200000 -failover up
Now that all the interface configs are in place, we can apply it.. here’s the easiest way I figured out; if anyone knows a better way I’d love to hear it!
# svcadm disable svc:/network/physical:default # for i in aggr1 aggr2 aggr100001 aggr100002 aggr200001 aggr200002 ipmp0 ipmp100000 ipmp200000 ; ifconfig $i unplumb ; done # svcadm enable svc:/network/physical:default
At this point, all of your interfaces should be up, and all the IP addresses should be pingable. Make sure that you can ping the individual interface IPs, and the IPMP IPs. You should be able to use the ‘ipmpstat’ command to see information about your groups; IE:
root@nexenta:/volumes# ipmpstat -a ADDRESS STATE GROUP INBOUND OUTBOUND nexenta-vl100 up ipmp100000 aggr100001 aggr100002 aggr100001 nexenta-vl200 up ipmp200000 aggr200001 aggr200002 aggr200001 nexenta up ipmp0 aggr1 aggr2 aggr1
Note that this configuration provides failover and outbound load balancing, but it does not provide inbound load balancing. If you would like inbound load balancing, you need to configure an IP alias on each of the ipmp interfaces, and then mix up which IP you use from the host that is connecting to your Nexenta machine (or use multipathing if it’s iSCSI!)
One last thing – once everything is configured, you will probably want to define your own ping targets. You can view the ones that ipmp picked automatically by running ‘ipmpstat -t’. On our configuration, on one VLAN, two Nexenta nodes picked each other.. so when we took machine two down (intentionally), machine one marked that interface down, and then when we booted machine two back up, it could not reach machine one’s interface, and marked its interface on that vlan down. Nice race condition. Oddly, mpathd (the daemon that does the checking) does not use a configuration file for ping targets, but instead relies on host routes. What we’ve done is added routes to the individual IP addresses that we would like to have it monitor by using the NMC command ‘setup network routes add’, and specifying the IP address to monitor as both the ‘Destination’ and the ‘Gateway’. We picked four to five IPs on each VLAN that were stable hosts (routers, Xen domain-0’s and the like), and added them on both hosts.. this will give more consistent results, as multiple core machines would have to go down before the interface would be disabled incorrectly.
I hope this helps! Please feel free to leave a comment if you run into any trouble getting it working.
Very informative! Can’t wait to read about the performance tests with a few servers in use.
Shouldn’t “…ifconfig $i unplumb…” actually be “…ifconfig $i plumb…” ??? Of course, I am probably wrong, cause it is Friday past 3pm, I might be in my own cloud. ;-)
Things have been pretty good so far! I’ve pounded the snot out of these a few times, haven’t been able to get them to really sweat. ;) The only real issue we’ve had is that I initially thought that the LACP config that Nexenta offered actually set up IPMP, so I had it set up with 8 ports on two separate switches with LACP turned off.. that essentially just randomly moves the MAC around, etc.. when we were only testing with one server, it worked ok, but once we added more to the mix, things went *BOOM*, and I felt really, really dumb! :)
Plumb vs Unplumb – unplumb is actually correct.. I am not actually sure what ‘disabling’ the svc:/network/physical:default is supposed to do, but it doesn’t unplumb the interfaces.. and enabling the service won’t bring things up properly unless the interfaces are not plumbed. Hence the unplumb in the middle. Solaris networking still confuses the crap outta me. ;)
Nate,
Which SuperMicro model did you go with, and which controller ? I’m having all sorts of issues with a SC847E26-JBOD, LSI 9200-8e SAS2 controller and Seagate Constellation ES 2TB SAS2 disks. In both Solaris and Linux I get weird SAS errors. SuperMicro claims that the LSI2008 chipset is incompatible with their dual expander setup, eventhough I’ve only attached one SAS link, and according to them and their “compatibility maxtrix” only internal SAS2 RAID controllers are qualified and supported, how that’s supposed to work with a JBOD I am yet to figure out (well basically, SuperMicro f*cked up and won’t own up to it).
There’s more to this story and I plan to write a blog post about my findings, so far I unfortunately can’t recommend SuperMicro to any, at least not their SAS2 JBODs.
Great Blog post. We are also working on deploying Nexenta at my current employer. We went down the LACP route and it’s worked well so far. I was actually looking for a solution to a FC Multipathing issue when I came across your site and just wanted to thank you for posting this information!
Any specific reason for using “6x Intel 32gb X25-E drives” for ZIL?
I’m fairly new to all this, but as I understand it, ZIL should be mirrored and that means he’s likely running 3x32GB (96GB) which should be relative to the 48GB of RAM and the amount and rate of synchronous writes he expects. In theory, that setup ensure that he can handle a SERIOUS amount of sustained, synchronous writes and with 10x1TB SAS drives, that makes sense to me. If anything one might call it a bit overkill, but then again it may be a bit of “futureproofing†in case the storage is grown. Just my 2 cents. Love to hear from the author so I can confirm if I’m learning anything.
I’m running in a virtualized environment and trying to figure out the smartest way to get some form of real trunking in place (so I can break the 1Gbe barrier w/o 1+Gbe equipment). IPMP is something I’d like to look at next but since you have LAG working here, can you tell me if you’ve observed with NFS that you can exceed 1Gbe rates (by even a consistently measurable amount, not looking for 2Gb) with your setup? I ask because you clearly have the I/O to feed it and with this network setup, if I understand how LAG is implemented in Solaris combined with LAG/LACP on a switch, it seems possible.
Any one stream (usually the combination src and dest mac addr) will be limited to the speed of one of the links that make up the channel/aggregate.
Hello
Great post, but I have issues with my interfaces disappearing from the ipmp group after reboot. It all works great until reboot.
Any ideas?
Thanks in advance