DFScala
DFScala is a library for using the dataflow model of parallelism in the Scala programming language.
Primarily written by Salman Khan and Daniel Goodman, DFScala provides a platform for research into dataflow programming as part of the work on the Teraflux project carried out at The University of Manchester. Over the course of its development it has also become a stable and relatively efficient platform on which to build whole dataflow applications and is being used by researchers associated with Teraflux including Behram Khan, Chris Seaton, Yegor Guskov, Mikel Luján and Ian Watson.
The corresponding researcher is Daniel Goodman, goodmand@cs.man.ac.uk.
Documentation
The use of DFScala is documented in our paper [1], in the readme files in the source code, and in the benchmark suite, where you can see example code. You can also browse the API documentation.
Tooling
DFScala includes an interface for plugging in loggers to monitor and interact with a dataflow program at runtime. We have implemented loggers that output XML for further analysis or Graphviz diagrams for visualisation. There is also a graphical logger available, which can help you to solve problems such as missing arcs in your dataflow graph or runtime load imbalance.
Downloads
DFScala depends on Java 1.6 (it doesn't work with Java 1.8) and Scala 2.9.1 at runtime, and SBT 0.11 for compiling.
Version 0.2
- dfscala_2.9.1-0.2.jar — binary jar for Scala 2.9.1
- dfscala-0.2-src.tar.gz — source code
- dfscala-benchmarks-0.2-src.tar.gz — benchmark suite source code
Version 0.1
- dfscala_2.9.1-0.1.jar — binary jar for Scala 2.9.1
- dfscala-scope_2.9.1-0.3-bundle.jar — binary jar of the scope tool for Scala 2.9.1
- dfscala-0.1-src.tar.gz — source code
- dfscala-scope-0.3-src.tar.gz — source code of the scope tool
- dfscala-benchmarks-0.1-src.tar.gz — benchmark suite source code
Demonstration
If you are interested in seeing what DFScala applications look like and quickly getting one up and running, you probably want the benchmark suite. The following instructions walk you through up to being able to see some results. We assume a unix-style system.
You will need Java 1.6, Scala 2.9.1 and SBT 0.11 installed. Most system's package managers will be able to provide these. Download dfscala-benchmarks-0.1-src.tar.gz.
Untar and compile the benchmarks.
tar -zxf dfscala-benchmarks-0.2-src.tar.gz
cd dfscala-benchmarks
sbt compile
Run the KMeans benchmark.
./kmeans -t 4 -f input/kmeans/random-n16384-d24-c16.txt
You will see the program run without output displayed at the terminal. When you've seen the program running, read the source code at src/main/scala/eu/teraflux/uniman/dataflow/benchmark/kmeans/KMeans.scala
.
Using the Scope Tool
The scope tool allows you to see the dataflow graph being built up and individual nodes being run. You will need a modern browser such as Safari, Chrome or Firefox.
Download dfscala-scope_2.9.1-0.3-bundle.jar and put it into the dfscala-benchmarks
folder. Then register the scope tool as the desired logger.
export CLASSPATH="$CLASSPATH:dfscala-scope_2.9.1-0.3-bundle.jar"
export JAVA_OPTS="$JAVA_OPTS -Deu.teraflux.uniman.dataflow.logger=eu.teraflux.uniman.dataflow.scope.Logger"
Run the KMeans benchmark again.
./kmeans -t 4 -f input/kmeans/random-n16384-d24-c16.txt
You will be prompted to open your browser to see the scope tool. Press 'Connect'. The program will run. Click 'Graph' at the top-right to see the graph build.
Acknowledgements
The Teraflux project is funded by the European Commission Seventh Framework Programme. Chris Seaton is an EPSRC funded student. Mikel Luján is a Royal Society University Research Fellow.