Thursday, July 10, 2014

Is Tapping Low Optical Budget Links Making you Pull Your Hair (or the TAPs) Out?

By: Gina Fallon, VSS Product Management

If you have ever had to do split ratio calculations for passively tapping network links, you have a tendency to want to pull your hair out over the mathematical Olympics required. When you have to deal with low optical budgets, it takes the challenge even a step closer to the tipping point where there is no budget left to establish link with the network device and/or probe attached to the passive tap.  The most common offenders are 10G & 40G Multimode where Cisco’s 40G Multimode BiDi (40GbaseSR2) budget is so tight it is not even recommended for passive tapping at all.

The solution is to go to active optical tapping. These taps don’t employ optical splitters (which have inherent insertion loss) and actually regenerate the optical signal on both the network and monitor transmit sides. VSS Monitoring offers vBroker Series products with PowerSafe chassis modules which also offer layer 1 Fail Open/Close state configurability if there is power loss or as a manual force Fail Open option during power on and a full range of optic technology support (10G MM, 40G MM, 40G MM BiDi, 40G SM, etc.).

Check out our full technical write up here: Tapping Low Optical Budget Links

Wednesday, June 25, 2014

Optimizing Monitoring Tools and Security Systems for 100Gbps & 40Gbps Networks

Most large organizations are either considering or have already begun to adopt higher bandwidth network infrastructure, typically 40G in the Enterprise and 100G in the carrier domain. Whenever a network undergoes a migration of that magnitude, the network monitoring and security architecture has to be revisited to ensure it’s ready to scale with the network.

Here are top three goals to keep in mind when redesigning the management infrastructure:
  1. Leverage what you have
  2. Maximize ROI
  3. Make it future proof
If there’s already a network packet broker (intelligent TAP) system in place—and in most large networks there will be—it should be used to properly “tune” the network traffic to the existing monitoring tools and security systems. Assuming the NPB system is sufficiently scalable and modular (and again, it should be), adding 100G or 40G capture interfaces/appliances will be fairly straightforward.

Once the physical capture interfaces have been added, most of the functions needed to accomplish tool optimization are reasonably simple, but could do with some emphasis. Check out this solution guide outlining the essentials of leveraging 1G and 10G toolsets across 40G and 100G networks:

 

Wednesday, June 18, 2014

Why Using Rx & Tx on Same Transceiver as Span & Monitor Is a Bad Idea

By: Gordon Beith, Director, Product Management

Certain Network Packet Broker (NPB) vendors claim they can double the capacity of their NPB products by using the receive (Rx) and transmit (Tx) sides of each fiber transceiver separately as Span inputs (Rx) and Monitor outputs (Tx) simultaneously. I expect they make this claim because their system cannot achieve the performance and scale equivalent to that of vBroker NPBs from VSS Monitoring.


If it sounds too good to be true…

Their idea has a single questionable benefit yet comes with several issues that I believe far outweigh any benefits. Providing this capability is not particularly challenging, technically, and any NPB vendor can take this approach if they choose so, but it’s dangerous. VSS Monitoring does not support this approach because of the negative impact it has on customer monitoring reliability.


Quite often the customers or partners who are making the decision about purchasing an NPB solution may not be totally aware of requirements that the monitoring solutions architects or operations staff have for the NPB layer. These requirements are likely to include certain items to guarantee a level of high availability for the monitoring/analytics solution, such as what is expected to happen when tools or tool ports go down, or possibly even just stating that all monitored traffic must be continuously forwarded to the monitoring center or storage server.

Therefore, it is crucial to ensure that the buyers clearly understand the impact of deciding on a solution based on the simultaneous use of Rx/Tx as Span/Monitor ports.


The Role of Rx with Tx

On fiber cable connections between equipment, the Rx side of a transceiver is used to detect when a link is up or down. This layer-1 mechanism detects link state, such that if the Tx on one end stops transmitting or the cable gets cut, then the Rx on the other end will sense that no layer 1 signal is present and will then indicate the link is down.
The Rx does not send any signal back towards a Tx, so the Tx cannot detect a loss of link. Only the Rx side is capable of this. Therefore, if the Rx is not connected to anything or to a different transceiver than what the Tx is connected to, then it will never be able to detect when the transceiver that the Tx is connected to either goes down or the cable in between is cut or disconnected.

In normal operating mode of the equipment, when the Rx side detects loss of signal, the Tx side then stops transmitting. To override this, it is possible to force the link to an “always up” state, to ensure that the Tx will continue to transmit, regardless of what the Rx detects. This works fine for single arm situations, where the Rx is never connected to anything and so will never detect a change in link state. 


However, if the Rx is connected to another transceiver, despite the Tx being forced up, if the Rx side detects a loss of link, the Tx will also stop transmitting.



Then it probably isn’t a good idea

There are four main negative impacts of trying to use the Rx and Tx on the same transceiver as independent span and monitor ports simultaneously:
  • Load balancing groups cannot adjust for tool ports that are down
  • Trigger policies become invalid, risking an outage because they cannot provide redundancy when a tool or tool port is down
  • Alerts cannot be sent to indicate that a tool or tool port is down
  • Monitor ports can go down when Span ports go down , which may also be very difficult to diagnose

Load Balancing Inability to Adjust

Load balancing spreads the load of traffic across multiple tool ports of the same type, ensuring that all traffic is monitored and that each flow or session is seen by the same tool. This allows more traffic volume to be forwarded and monitored than a single port or single tool instance can handle.Load balancing automatically adjusts traffic across a group of ports so that when a tool port loses link, a number of different actions can be performed, such as:
  • Rebalance all traffic across remaining ports that are up in the load balancing group
  • Rebalance just the traffic intended for the port that has gone down, to the remaining ports that are up in the load balancing group
  • Replace the tool port that has gone down with a back-up tool port, which will now receive the traffic within the load balancing group
Load-balancing groups provide for these actions, thereby minimizing any loss of monitoring visibility when a tool or tool port goes down.

If the Rx is unable to detect the link state, then the load-balancing group will be unable to adjust and all traffic that is being forwarded to the port that has gone down will be dropped/lost.


Figure 1. Forwarding and Load Balancing
Figure 2. Load Balancing Groups Definition


Lack of Redundancy

Forwarding rules for traffic towards tool ports, often referred to as Filter Maps, whether as a part of a Monitor load-balance group or as individual Monitor ports, can be conditional to allow for redundant back-up forwarding rules. The number one condition for determining these rules is link state, such that if the state of an actively mapped tool port link is detected as down, then the mapping can automatically change to a back-up, redundant tool port, thereby minimizing any loss of monitoring visibility. Trigger Policies are used to provide this type of action.

If the Rx is unable to detect the link state, then conditional forwarding rules will be unable to adjust and all traffic that is being forwarded to the port that has gone down will be dropped/lost.
Figure 3. Trigger Policy for Link Up/Down
Figure 4. Conditional Mapping Based on Trigger Policy

No Alerting
Alerts, in the form of Syslog messages or SNMP traps, can be sent by an NPB based on change of state of ports, such that if a tool or tool port goes down, or even back up, an alert will be sent. This allows users to see that something has happened or gone wrong in the monitoring solution and corrective action can be taken.

If the Rx is unable to detect the link state, then no alert will be sent and no corrective action will be taken, resulting in all traffic that is being forwarded to the port that has gone down being dropped/lost.


Monitor Ports Going Down

As described above in The Role of Rx with Tx, forcing the Tx link state to an “always up” state can override that the Rx does not detect link. However, if the Rx is connected to a different transceiver at the other end, and that link is lost (i.e. link goes from up to down state), the NPB hardware is very likely to detect this state change and bring the Tx side to a down state as well, which may last for a limited period, or permanently until the user takes corrective action.

The result will be no forwarding of traffic to that Monitor port, despite that fact that the tool and tool port may actually be up.

Conclusion

The use of receive and transmit sides of a fiber transceiver in an NPB as simultaneous Span and Monitor ports creates blindness in any monitoring or analytics solution. The perceived savings and capacity increase is superficial and misleading, with a potentially catastrophic result of now an inability to automatically detect and take action to ensure continued forwarding of traffic to the monitoring and analytics tools that are being used to detect network degradation or security threats; they won’t be able to do so reliably.

VSS Monitoring assures failsafe and wire-speed delivery of packets to performance tools and security systems. The simultaneous use of the Rx and Tx degrades performance and reliability of and security or performance assurance solution. Ultimately, the customer needs to consider whether degrading effectiveness and risking outages is worth the perceived initial benefit, and whether the degradation and risk are at odds with the primary purpose of the NPB and monitoring system.

Thursday, April 3, 2014

Packet Captures DO Matter

By Adwait Gupte, Product Manager

The other day, I overheard a discussion between a colleague and a market analyst over the value of packet-level information. The analyst didn’t think full packet capture made sense for NPM/APM tools, because they could perform their functions effectively using only metadata and/or flow statistics.

So, are network recorders and their ilk really on the way out? Is complete packet capture useless?

I argue “no.” And here’s why: APM tools can generally identify issues for a given application (e.g. Lync calls are dropping). These issues might arise from the compute infrastructure (slow processors, insufficient RAM), but they could also lie within the network infrastructure (link overload, badly tuned TCP parameters, etc.). In the latter case, the root cause would be extremely difficult to identify and debug without having a complete, packet-level record.

When investigating a breach or “exfiltration” (such as Target’s), you absolutely need the full packet data, not just flow level metrics (which show only some activity, not exactly “what” activity took place) or metadata (which shows “some data” was sent out, not “which data” was sent out). Summarized flow statistics (or metadata) are an inherently a glossy approach to “compressing” monitoring data. True, they take up less space and can be processed faster than a full packet, but they omit information that could be critical to a discovery process.

While full packet capture is not required to show that application infrastructure is faultless when performance issues arise, it is certainly required when the problem is caused by the network or when the exact data that was transmitted is required for troubleshooting or security purposes. Full packet capture makes sense for both APM and security use cases. However, full packet capture for everything, all the time is ridiculously cost prohibitive. Networks engineers and security analysts need to capture just the data they need and no more.

Aside from the obvious compliance mandates, continuous packet capture prevents data gaps. Implemented efficiently, full packet capture is also feasible in terms of cost and management. One of the key elements of such efficiency is decoupling the data from vertically integrated tools. I covered probe virtualization in a previous post, but some of these points are worth repeating in the context of making full packet capture scalable:
  • Tools that integrate capture, storage, and analysis of packet data are expensive. They also have limited storage and compute capacity. If you run out of either, the only way to expand is to buy a new appliance. An open capture and storage infrastructure makes the scaling of at least those parts of the equation more cost effective.
  • NPM/APM tools already make use of complete packets in the sense that they hook into a network tap/span port and accept these packets. Whether they store them internally or process them on the fly and discard them depends on the tool. The point is, if we are able to separate the collection of the data (packet capture) from the consumption of the data (the NPM/APM analytics, forensics etc.), it makes the data a lot more versatile. We can collect the data once and use it for multiple purposes, anytime, anywhere.
  • The exact tool that will be used to collect this data need not be known at the time of collection since the data can being collected in an open format (e.g. PCAP). Such format makes the data future proof. 
  • Virtualized analytics tools are on the horizon (customers are demanding them). Then, these virtualized appliances will need to be fed data from separate capture/storage infrastructure, although some of these functions can be taken care of by the Network Packet Brokers (NPBs) that collect the data across the network.
In addition to these straightforward benefits, preserving packet data for use by more than a single tool enables network data to be managed and utilized with “big data” systems. Decoupling packet capture from tools enables security analysts and network engineers to glean insights by unifying the siloed data. Network packet capture tools allow network data (which, hitherto, has been missing from the big data applications) to be brought into the big data fold and help uncover even more insights.

A full, historical record of packets (based on continuous capture as a separate network function) is not only useful, but will remain relevant for the foreseeable future. A system that utilizes programmability to trigger packer capture based on external events, then forwards packets in real-time while simultaneously recording the flow of interest, enabling asynchronous analysis, will increase the value of such capture. Now, that’s something only VSS Monitoring can do today (a post for another day).

Friday, March 28, 2014

Network Functions Virtualization Meets Network Monitoring and Forensics

By Adwait Gupte, Product Manager

Enterprises and service providers are increasingly flirting with Network Functions Virtualization (NFV) as a means to achieve greater efficiency, scalability and agility in the core and datacenter.

NFV promises a host of benefits in the way networks are created, managed and how they evolve. Compute virtualization has, of course, redefined data centers, transforming servers from computers to virtual processing nodes that can run on one or many physical servers. This separation of processing hardware from the abstract “ability to process” definition of servers allows a lot of flexibility in the way datacenters are managed and how workloads are managed, especially in multi-tenant environments.

Network Functions Virtualization (NFV) is a similar concept, applied to networking. But haven’t switches and appliances always been distributed network “processing” nodes? NFV proposes replacing the integrated, purpose built software/hardware boxes, such as routers and switches, with commodity processing platforms and software that performs the actual network function. Thus, rather than having a box with its own network OS, processing power, memory and network ports which together function as a router, NFV proposes having a general purpose hardware with processing power, memory and ports that run software that transforms it into a router. In some cases, it’s more costly and less efficient to hand a networking job to a general purpose processor. The advantage of this virtualized router is that the software layer can be changed on the fly to turn this router into a switch or a gateway or a load balancer. This flexibility enables polymorphism within network infrastructure and promises to deliver a more nimble design that can be dynamically repurposed according to the changing needs of the network, thus future proofing the investment made in acquiring the infrastructure.

Today switching and routing functions can be virtualized, with some tradeoffs.  More sophisticated functions for security and network/application monitoring still require hardware acceleration. Tools such as NPM and APM and security systems such as IPS, which operate on real time data, have arrived in a virtual form factor for some use cases. Technologically speaking, this seems to be the logical evolution that follows the virtualization of much of data center infrastructure. While there remains debate as to whether the tool vendors embrace or attempt to stymie this evolution, the more critical question is: What elements require optimized processing and hardware acceleration?

From the customer’s viewpoint, virtualization reduces the CAPEX allocated to such tools and systems. As virtualized tools become available, it might become easier for customers to scale their tool deployments to match their growing networks. The hope of scaling out, without needing to buy additional costly hardware based appliances, is an obvious attraction. They can instead just increase the compute power of their existing infrastructure and possibly buy more instances of the virtualized probes, as necessary. In a multi-tenant situation, these probes may even be dynamically shared as the traffic load of individual tenants varies. But what if those tools and probes cannot function without hardware acceleration? What if running them on general purpose compute proves more expensive than running them on optimized systems?

There’s no reason to adopt virtual tools and systems that can’t get the job done or that increase costs.

Further, while routing/switching are very well understood functions that even nascent players can virtualize, there is a significant operational cost to any such changeover. Advanced monitoring features are much more complicated and sophisticated. In contrast to infrastructure elements, tools and security systems require a greater development investment and more often require highly integrated hardware to function efficiently. 

I think the driving force behind this transformation will have to come from the customers, especially large ones, who have the economic wherewithal to force the vendors to toe the line towards virtualization. An example of such a shift is AT&T’s Domain 2.0 project. As John Donovan put it, “No army can hold back an economic principle whose time has come.”

As the large customers build pressure on the vendors to move towards virtualization, I think we will start seeing some movement towards NFV within advanced products of the networking space. One element of this change is already occurring in forensics or “historical” (as opposed to real time) network analysis. Historical analysis functions, such as IDS or Network Forensics, can be virtualized to a great degree, but these systems, today, tend to be monolithic devices. These devices combine capture, storage and analysis. As has been shown repeatedly in the past, there’s certainly value to specialization; especially when line-rate performance is required. Capturing network data, storing it efficiently for retrieval, and building smart analytics are diverse functions that have been coupled in the past.

Today, just as we consider decoupling network functions from underlying hardware, we should also look at the benefits of decoupling network data from analysis software and hardware appliances. After all, these systems are hardware, software, and data. Ultimately, NFV provides an opportunity for the analytics tools and security systems to offload the data capture and storage duties to other elements, enabling hardware optimization (if required) and freeing the data to be used by a variety of systems. A move towards NFV by the analytics vendors would bring with it all the advantages of scalability and cost-effectiveness that NFV promises in other networking domains—but  analytics vendors need to decouple data processing as much they need to virtualize functionality.