Distributed Systems Conference

16 February 2019 | Pune

Speakers

Dr. Milind Bhandarkar

Dr. Milind Bhandarkar was the founding member of the team at Yahoo that took Apache Hadoop from a 20-node prototype to datacenter-scale production system. Parallel programming languages and paradigms has been his area of focus for over 20 years. He's worked at the C-DAC, National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale, Yahoo, LinkedIn, and Greenplum. Prior to founding Ampool, Milind was the Chief Scientist at Pivotal. Milind holds his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign.

Talk: Architecting Modern Data Platforms for Hybrid Clouds

Dr. Neil J. Gunther

Neil Gunther is a computer information systems researcher best known internationally for developing the open-source performance modeling software Pretty Damn Quick and developing the Guerrilla approach to computer capacity planning and performance analysis. He has also been cited for his contributions to the theory of large transients in computer systems and packet networks, and his universal law of computational scalability.

Talk: Applying The Universal Scalability Law to Distributed Systems

Dr. Sriram Srinivasan

Dr. Srinivasan is a systems and programming languages geek with 30+ years of experience, from embedded devices to large-scale distributed systems and frameworks. He was one of the principal designers and implementors of the Weblogic application server. He has a PhD from the Univ. of Cambridge and teaches Distributed Systems at IIT Bombay.

Talk: Knowledge Logic: Applications to Distributed Systems and Life in General

Somya Maithani

Somya Maithani is currently a backend developer (SDE II) at Helpshift for the last three and a half years. She loves solving logical problems, especially the ones that are challenging. In her spare time, she mostly reads books and watch series. She loves to bake and have just started dabling in that.

Talk: Combating Entropy in a Highly Distributed System

Tapasweni Pathak

Tapasweni as a Senior Software Engineer, Manager at Reliance Jio Financial Innovation focuses on owning highly distributed system based financial products and making them seamlessly available. She likes keeping herself involved with open source projects and communities.

Talk: Highly Scalable Distributed Tracing and Monitoring System

Schedule

09:00 AM

Registration

09:30 AM

Architecting Modern Data Platforms for Hybrid Clouds

In the last decade, since the emergence of public clouds, a hard boundary has remained between public clouds and on-premises infrastructures and services. With Azure Stack, GKE, VMWare on AWS, and recent announcements about AWS Outposts, it is clear that the line between public clouds and on-premises infrastructures and services is blurring. Recent developments in the industry, such as merger between Hadoop rivals Cloudera & Hortonworks, as well as IBM's acquisition of RedHat, indicate a trend that an exciting hybrid cloud future awaits us. Public clouds entering on-premises means same logically centralized control planes (and associated managed services) will be available on both public clouds, and on-premises, making hybrid data planes possible. In addition, software systems deployment targets, which until recently were limited to bare metal and heavyweight virtual machines, have proliferated. They now include a hierarchy: Bare metal physical machines, Virtual Machines, MicroVMs, Containers, Isolates, & Functions. This fundamentally changes how distributed data platforms and data-intensive applications will be developed. This talk outlines the architectural building blocks that can be used in specific design patterns for developing modern distributed data platforms. We will draw from our experiences in developing a prominent distributed data analytics platform, Apache Hadoop, and outline how such a platform could be built today with modern building blocks. We intend to cover most aspects of the platform control planes such as high availability, disaster recovery, security, resource scheduling, orchestration, resource isolation, allocation, management, monitoring, scaling, metrics, metering, and logging. I will illustrate this architectural paradigm shift with some of the design choices we have made at Ampool, a modern data analytics platform.

EXPAND COLLAPSE

Dr. Milind Bhandarkar

10:30 AM

Tea Break

11:00 AM

Applying The Universal Scalability Law to Distributed Systems

When I originally developed the Universal Scalability Law (USL), it was in the context of tightly-coupled Unix multiprocessors, which led to an inherent dependency between the serial contention term and the data consistency term in the USL, i.e., no contention, no coherency penalty. Later, I realized that the USL could have broader applicability to large-scale clusters if this dependency was removed. In this talk I will show examples of how the USL can be applied as a statistical regression model to a variety of large-scale distributed systems, such as, Hadoop, Zookeeper, Sirius, AWS cloud, and Avalanche DLT, in order to quantify their scalability in terms of numerical concurrency, contention, and coherency.

EXPAND COLLAPSE

Livestream

Dr. Neil J. Gunther

12:00 PM

Lunch

01:00 PM

Combating Entropy in a Highly Distributed System

In this talk, we’ll go over the case study of Helpshift as a massively distributed architecture that sees 160,000 requests per second and cover topics with respect to problems that arise due to entropy. Topics will include, what data inconsistencies are in real world, Helpshift’s distributed architecture, initial attempts and why did they not work? We’lI also talk about isolating the root cause which results in data inaccuracies, the solution and mitigating unsolvable side effects of distributed architectures.

EXPAND COLLAPSE

Somya Maithani

02:15 PM

Highly Scalable Distributed Tracing and Monitoring System

In large scale customer focused distributed systems with asynchronous programming based products, tracing and monitoring becomes hard. As system scales with higher request counts for the product with multiple projects involved in a single pipeline, having proper logging, tracing, monitoring and metric system setup for the product reliably available becomes difficult. In this talk I will be speaking about comparing distributed monitoring, observability, and performance packages and the best practices to build a generic scalable distributed monitoring, observability, and performance package independent on your code style of your project.

EXPAND COLLAPSE

Tapasweni Pathak

03:00 PM

Tea Break

04:00 PM

Knowledge Logic: Applications to Distributed Systems and Life in General

Knowledge Logic is a system of logic that provides a framework for what it means to "know" something, or how to deduce "I know that you know that I know". This has far-reaching implications for everything from distributed systems to politics to navigating traffic. This talk is a gentle introduction to the topic, and will leave you with the knowhow to topple dicators.

EXPAND COLLAPSE

Dr. Sriram Srinivasan