Advanced Processor Technologies Home
APT Advanced Processor Technologies Research Group

DFScala


DFScala is a library for using the dataflow model of parallelism in the Scala programming language.

Primarily written by Salman Khan and Daniel Goodman, DFScala provides a platform for research into dataflow programming as part of the work on the Teraflux project carried out at The University of Manchester. Over the course of its development it has also become a stable and relatively efficient platform on which to build whole dataflow applications and is being used by researchers associated with Teraflux including Behram Khan, Chris Seaton, Yegor Guskov, Mikel Luján and Ian Watson.

The corresponding researcher is Daniel Goodman, goodmand@cs.man.ac.uk.

Documentation

The use of DFScala is documented in our paper [1], in the readme files in the source code, and in the benchmark suite, where you can see example code. You can also browse the API documentation.

Tooling

DFScala includes an interface for plugging in loggers to monitor and interact with a dataflow program at runtime. We have implemented loggers that output XML for further analysis or Graphviz diagrams for visualisation. There is also a graphical logger available, which can help you to solve problems such as missing arcs in your dataflow graph or runtime load imbalance.

Downloads

DFScala depends on Java 1.6 (it doesn't work with Java 1.8) and Scala 2.9.1 at runtime, and SBT 0.11 for compiling.

Version 0.2

Version 0.1

Demonstration

If you are interested in seeing what DFScala applications look like and quickly getting one up and running, you probably want the benchmark suite. The following instructions walk you through up to being able to see some results. We assume a unix-style system.

You will need Java 1.6, Scala 2.9.1 and SBT 0.11 installed. Most system's package managers will be able to provide these. Download dfscala-benchmarks-0.1-src.tar.gz.

Untar and compile the benchmarks.

tar -zxf dfscala-benchmarks-0.2-src.tar.gz
cd dfscala-benchmarks
sbt compile

Run the KMeans benchmark.

./kmeans -t 4 -f input/kmeans/random-n16384-d24-c16.txt

You will see the program run without output displayed at the terminal. When you've seen the program running, read the source code at src/main/scala/eu/teraflux/uniman/dataflow/benchmark/kmeans/KMeans.scala.

Using the Scope Tool

The scope tool allows you to see the dataflow graph being built up and individual nodes being run. You will need a modern browser such as Safari, Chrome or Firefox.

Download dfscala-scope_2.9.1-0.3-bundle.jar and put it into the dfscala-benchmarks folder. Then register the scope tool as the desired logger.

export CLASSPATH="$CLASSPATH:dfscala-scope_2.9.1-0.3-bundle.jar"
export JAVA_OPTS="$JAVA_OPTS -Deu.teraflux.uniman.dataflow.logger=eu.teraflux.uniman.dataflow.scope.Logger"

Run the KMeans benchmark again.

./kmeans -t 4 -f input/kmeans/random-n16384-d24-c16.txt

You will be prompted to open your browser to see the scope tool. Press 'Connect'. The program will run. Click 'Graph' at the top-right to see the graph build.

Acknowledgements

The Teraflux project is funded by the European Commission Seventh Framework Programme. Chris Seaton is an EPSRC funded student. Mikel Luján is a Royal Society University Research Fellow.

References

  1. D. Goodman, S. Khan, C. Seaton, Y. Guskov, B. Khan, M. Luján, and I. Watson. DFScala: High level dataflow support for Scala. In Proceedings of the Second International Workshop on Data-Flow Models For Extreme Scale Computing (DFM), 2012. (slides)