Data center migration network link detection technology－websitedesign

Link failure is a very common type of failure encountered in data center migration.If within the data center, it's very easy, by increasing the link backup way, improve the reliability, general distribution on different network equipment, as far as possible isolation between each other, so that when one side link failure, service timely and to the other side, the link can be more than two can be, the more the higher reliability.The most common way is to use aggregation, where there are a few or several problems, and the business can switch to a normal link.This external environment is not controlled by the data center if it is outside the data center, especially the leased operator's line.You can rent multiple links if you can afford it.Single link failure, business can also take other links.However, like the Ucloud and alipay, there are backup links, alipay even has four links, so long as there is a link, the business will not be broken.Unfortunately four fault event or happening, all of them, and then the way to save data center is only a long-distance data center or disaster data center, when the running data centers all external link is interrupted, the business can be migrated to other data center, in time to keep the business is not affected.This is also the importance of setting up a disaster preparedness data center. If Ucloud and alipay have a complete disaster preparedness system in advance, the business will not be interrupted for so long.At ordinary times in the data center and disaster data center there is a real time backup between traffic, once the primary data center fails, the application of automatic switch to the disaster data center to run on, switching process is very short, has little impact on the business.

Only various link backup, backup data center is still not enough, the most important thing is to have a means of can detect link failures, and according to the results of these tests to automate the business of the switching action.First, data center has a network monitoring system, when there is a link of the DOWN event, can be detected in the center of the network management, network management center can according to link the DOWN position and quantity, manual or automatic way to switch from the switch link or business.Artificial way is by checking the link DOWN fault location, targeted business switch, automatic way is through the link DOWN event set in advance good action linked up with the system, according to the different location of the link DOWN have different contingency plans, as long as the system will automatically perform business.Second, interconnect links among a lot of time can through the optical transmission equipment (mainly in the external data center or across) between data centers, even if this end link DOWN, on the other side does not perceive, will need to deploy some testing protocol to perception.Common are aggregated LACP protocol, DLDP agreement, the OAM, LACP protocol if the slow test, 30 seconds to send a probe packets, 90 seconds timeout, so the switching speed is relatively slow,, of course, this can be configured for rapid detection, 1 second fastest to send a probe packets, 3 seconds a timeout, so be able to complete the switch link in a few seconds.Sometimes, if not aggregation backup relationship, then draws on DLDP agreement, DLDP agreement this is testing link failures of the single fiber, if USES the DLDP, when the agreement immediately after the timeout for port SHUTDOWN operation, so cloud management platform can perceive DOWN to the port, take the repair action.OAM protocol is also a link detection, is a physical link layer protocol, so spending less, detection speed faster, and rich, can alarm, can be DOWN to the port, linkage and other agreements.Third, there should be a disaster preparedness data center.If the DOWN inside the data center, the business scope is not very wide, but if it is a data center with external interconnection of port appeared DOWN, serious causes the entire data center can't run, would then enable the disaster data center.Switch the application business to the disaster readiness data center and take over the business from the disaster preparedness data center.Data center and disaster preparedness in the main business should be a backup, real-time business between data centers and a set of common management platform, to ensure the data center failure, a business can be smooth switch to the disaster data center, there are usually adopts or routing switch method, by adjusting the routing to the introduction of the business flow disaster data center.To implement this process is relatively complex, want to know the business model of multiple data centers, business migration needs to be done, by adjusting the routing will switch to the disaster data center business.Fourth, adjust the routing is too slow, sometimes also prone to errors, then there is the VXLAN technology, VXLAN technology to make a big layer through multiple data centers, different data centers within the virtual machine can be freely to other data center migration (the so-called forward migration refers to the second floor).So that when a data center failure, all business all of the virtual machine can migrate to the disaster data center, the whole process in business level without awareness, switch speed is quick, easy, and most of the time this migration is the system automatically, don't need people to participate in.
There are a lot of link detection and switching methods in the data center to deal with the impact of the sudden link failure on the business.Everything is too much of a good thing, of course, for the core network equipment, often contain hundreds or even thousands of port, so many port if do test at the same time, the device will be struggling to deal with port giving all the large number of testing of the message, to cause equipment CPU burden, so check whether to deploy a link, are deployed to which ports, specific what testing protocols and methods, are specific to the analysis, according to each data center to deploy their business needs, as far as possible in order to not increase the burden of equipment, at the same time to achieve detection as well.