Talk at DockerCon EU 2015

Our talk “Swarming Spark Applications” submitted at DockerCon 2015 has been accepted! Here is the abstract:

We built Zoe, an open source user-facing service that ties together Spark, a data-intensive framework for big data computation, and Swarm, the Docker clustering system. It targets data scientists who need to run their data analysis applications without having to worry about systems details. Zoe can execute long running Spark jobs, but also Scala or iPython interactive notebooks and streaming applications, covering the full Spark development cycle. When a computation is finished, resources are automatically freed and available for other uses, since all processes are run in Docker containers.

In this talk we are going to present why Zoe, the Container Analytics as a Service, was born, its architecture and the problems it tries to solve. Zoe would not be there without Swarm and Docker and we will also talk about some of the stumbling blocks we encountered and the solutions we found, in particular in transparently connecting Docker hosts through a physical network. Zoe was born as a research prototype, but is now stable and is currently being used to run real jobs from users in our research institution. Application scheduling on top of Swarm and optimized container placement will also be covered during the presentation.

You can find more information about Zoe here: http://www.zoe-analytics.eu

New paper accepted at the Third International Workshop on In-memory Data Management and Analytics, VLDB 2015

Cost-based Memory Partitioning and Management in Memcached, by D. Carra, P. Michiardi and M. Steiner

ABSTRACT:

In this work we present a cost-based memory partitioning and management mechanism for Memcached, an in-memory key-value store used as Web cache, that is able to dynamically adapt to user requests and manage the memory according to both object sizes and costs. We then present a comparative analysis of the vanilla memory management scheme of Memcached and our approach, using real traces from a major content delivery network operator. Our results indicate that our scheme achieves near-optimal performance, striking a good balance between the performance perceived by end-users and the pressure imposed on back-end servers.

Third International Workshop on In-memory Data Management and Analytics: http://imdm.ws/2015/