Distributed Keras

This project was a small research project to evaluate asynchronous optimizers. Personally, I would not use it in production environments without making drastic changes to its core. Nevertheless, if it fits your use-case, feel free to continue using the package.

Introduction

Distributed Keras is a distributed deep learning framework built on top of Apache Spark and Keras with the goal to significantly reduce the training using distributed machine learning algorithms. We designed the framework in such a way that a developer could implement a new distributed optimizer with ease, and thus enabling a person to focus on research and model development.

Our distributed deep learning approach mainly follows the data-parallel paradigm proposed in Large Scale Distributed Deep Networks by Jeffrey Dean et al. In this approach, copies of a model are replicated over different "trainers". These trainers could be distributed over different computing nodes, but it is also possible that several trainers share the resources of a single machine. Furthermore, the dataset will be partitioned in a way such that every replicated model will be trained on a different partition of the complete dataset. This is shown in the Figure below.

Installation

We focussed as wel on a straightforward installation process. Depending on your personal preferences, there are 2 ways to install DK. Furthermore, we also assume that an up and running instance of Apache Spark has already been installed.

Pip

You can use this method when you only require the framework. We are planning to put DK in the Pip repository for future convenience, however, before this happens, we first need to document the code a little bit better so it can be understood by everyone.

          pip install git+https://github.com/JoeriHermans/dist-keras.git

Git

Using this approach, you will have access to all the examples.

          git clone https://github.com/JoeriHermans/dist-keras

However, in order to install possible missing dependencies, and to compile the DK modules, we need to run Pip.

          cd dist-keras
pip install -e .

This command will tell Pip to use the setup.py file in the root directory of the DK Git repository.

References

Zhang, S., Choromanska, A. E., & LeCun, Y. (2015). Deep learning with elastic averaging SGD. In Advances in Neural Information Processing Systems (pp. 685-693). [1]

Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051. [2]

Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A. Y. (2012). Large scale distributed deep networks. In Advances in neural information processing systems (pp. 1223-1231). [3]