NXE: Network eXperiment Engine: software to automate networking experiments in real testbeds

Authors

This software is currently developed by Romaric Guillier, PhD student advised by Pascale Vicat-Blanc Primet. The NXE software has been transfered to the INRIA spinoff LYaTiss. For any information please contact:contact@lyatiss.com

Networking Experiment definition

A networking experiment is described with a scenario skeleton defined as a succession of dates at which an event occurs. The events corresponds to the starting point of an action (e.g. the start of a new bulk data transfer, of a new web session) combined with the parameters relevant to this action (e.g. distribution law of file sizes, inter waits) between a set of end-hosts, whose size depends on the kind of application we are trying to model (e.g. 2 for data transfers, many for parallel applications).
The end-hosts are organised in a networking abstract topology, that roughly defines sites (e.g. aggregation of end-hosts, as in a cluster), aggregation points between them (e.g. switches or routers) and networking links. The ``abstract'' term refers to the fact that an instantiation of the nodes over this topology is needed at run-time as in real testbeds, and resources allocation mechanisms might be used to accommodate multiple users.
The following figure shows the experiment workflow, and the various operations that are done at each stage of the execution of a protocol evaluation scenario. This workflow is a description of an evaluation process.

It is composed of a number of tasks which are connected in the form of a directed graph. These tasks have been broken up into elementary operations to explicit what is done precisely at each stage. The tasks were designed so that there is as little interaction as possible between successive tasks. The description of each stage of the workflow is as follows:

Description

Reservation: at this stage, the available resource allocator services are contacted to get the resources needed by the experiment, e.g. computing nodes or network links.
Deployment: this configuration phase can be either a reboot of the nodes unto an adequate kernel image, or just the setting of the OS internal variables (e.g. TCP buffer size) to the appropriate value.
Configuration: at this stage, the available hardware (e.g. hardware latency emulator, routers) are contacted to alter the topology, or to activate the gathering of statistical information (e.g. aggregate throughput) according to the needs of the experiment.
Scenario execution: here the actual execution of the scenario is started. The scenario can be run multiple times in a row to ensure that the results are consistent.
Log handling: the logs generated by the nodes and the global logging facility are gathered at a single point for analysis.
Log analysis: the logs are parsed, and metrics are computed from them to generate graphs that can be easily interpreted by the user.
Archiving and cleaning: resources are reset and released.

Implementation

The actual implementation is made in Python and uses the python Expat XML library, the paramiko SSH2 library and the Gnuplot python bindings. The additional programs iperf and D-ITG are used to simulate the workloads. Bash scripts are used to wrap the calls to these programs on the end-hosts that are used in the scenarios.
The scenarios are described through XML files that provides a simple and hierarchical description of the topology, the general configuration and the interactions between the end-hosts. A document describing the XML format is available here
The XML input files are divided into three parts:

Topology: a simple abstract topology description, providing an easy way to describe the resources and how to exploit them. External scripts are required to indicate the way the nodes' reservation and deployment will be performed. This way, it is easy to adapt to local nodes management policy.
Configuration: a collection of keys that affects the global behaviour of the application, such as forcing the synchronisation of scripts prior to the run of an experiment or the login information.
Scenario: each node (or set of nodes) involved in the scenario is given a role (server, client, etc) and a list of execution steps that are to be run during the experiment \eg a set of dates according to a centralised relative timer and a set of scripts that are to be executed on the target node(s) at the appropriate time.

NXE is an application that scripts the execution of a schedule based on a centralised relative timer, that is generated from the input scenario description. It assumes that the granularity of the scenario execution steps is coarse and in the same order of magnitude as a second. The timers used are much more finely grained (in the order of a few milliseconds), so the tool could be enhanced to be more precise, but currently it doesn't seem relevant to the general user-context. For scalability purposes, it launches a separate thread for every node involved in the scenario, and issues the commands at the appropriate time via an SSH remote command execution. Only one SSH connection is opened per node and it is kept during the whole life of the experiment. The commands are executed via different SSH channels over this single SSH connexion.
NXE has been developped on the Grid'5000 testbed

Publications

A User-Oriented Test Suite for Transport Protocols Comparison in DataGrid Context
@InProceedings{GuillierTestsuite09,
author = {Romaric Guillier and Vicat-Blanc Primet, Pascale},
title = {{A User-Oriented Test Suite for Transport Protocols Comparison in DataGrid Context}},
booktitle = {{ICOIN 2009}},
month = {January},
year = {2009},
}
[bib][pdf]
Methodologies and Tools for Exploring Transport Protocols in the Context of High-Speed Networks
@InProceedings{GuillierMethodologies08,
author = {Romaric Guillier and Pascale Vicat-Blanc Primet},
title = {Methodologies and Tools for Exploring Transport Protocols in the Context of High-Speed Networks},
booktitle = {IEEE TCSC Doctoral Symposium},
month = {May},
year = {2008},
}
[bib][pdf]
Towards a User-Oriented Benchmark for Transport Protocols Comparison in very High Speed Networks
@TechReport{RRGuillierBenchmark07,
author = {Guillier, Romaric and Hablot, Ludovic and Primet, Pascale},
title = {Towards a User-Oriented Benchmark for Transport Protocols Comparison in very High Speed Networks},
year = {2007},
month = {07},
pages = {39},
institution = {INRIA},
number = {6244},
type = {Research Report},
url = {http://hal.inria.fr/inria-00161254/fr/ },
note = {Also available as LIP Research Report RR2007-35},
}
[bib][pdf]
High Speed Transport Protocol Test Suite
@Unpublished{PGuillierTestsuite07,
author = {Romaric Guillier and Pascale Vicat-Blanc Primet},
title = {High Speed Transport Protocol Test Suite},
note = {poster, SuperComputing 2007},
month = {November},
year = {2007},
}
[bib][pdf]

Acknowledgment

This work has been funded by the French ministry of Education and Research, INRIA, and CNRS, via ACI GRID's Grid'5000 project , ANR HIPCAL grant, the ANR IGTMD grant, IST EC-GIN project, INRIA GridNet-FJ grant, NEGST CNRS-JSP project.

Download

The version 1.0 is available here under the CeCILL licence

Documentation

A document describing the XML format is available here

Links

NXE is used by HSTTS