1st Feb 2020Novotel, Viman Nagar, Pune
Rajeev N Bharshetty
Rajeev is a Passionate Polyglot Programmer and a theoretical Computer Science Nerd. He is currently interested and focused on Distributed Systems, Security and Data. He is working on building reliability at scale on 300+ Micro-services at GO-JEK.
Unmesh Joshi is Principal Consultant, Head of Technology at ThoughtWorks, India. He’s a self-proclaimed enthusiast of Distributed Systems and it’s implementations.
Manisha Salve is currently working with Data Direct Networks, India as Senior developer. She has been extensively working on storage and High Performance computing domain for more than a decade now! She is a hardcore programmer and loves to work on complex systems.
Piyush has been working on Infrastructure Engineering and Distributed Systems for almost a Decade, from the days when things would break they would make a sound. He considers himself fortune to have learned these skills from some of the best engineers while scaling fairly large complex database systems like Cassandra, building an Iaas platform, or building his own microservice communication bus.
His most recent stint was heading Site Reliability Engineering at Trustingsocial.com which credit-scored nearly half-billion users across 3 countries, 5 datacenters, 3 clouds.
Udayan did his MS in Computer Science from Stanford University. After working in parallel programming methodologies for a few years, Udayan started Oneirix Labs. Oneirix develops new technologies (hardware and algorithms) for companies across the world.
Jaideep is the Co-founder of One2n consulting, helping businesses grow while they are scaling. From setting up bare-metal machines in Datacenters to running Distributed Systems on top of it. He has worked on running systems reliabiliy on production. He holds a Master in High Performance Distributed Computing from Vrije University, Amsterdam.
Registration & Breakfast
It Won't Make a Noise When it Breaks
Systems fail but the real failures are the ones from those we learn nothing. This talk is a tale of few such failures that went right under our noses and what we did to prevent those. The failures covered range from Heterogenous systems, unordered events, missing correlations and just human errors.
Patterns of Implementing Consensus
Consensus is an important concept in distributed systems. There are various algorithms like Paxos, ZAB, RAFT which are used in mainstream products. While on the face of it the algorithms look different, there are some common implementation patterns observed in all mainstream implementations. The talk will show working implementations of ZAB and RAFT to explain these patterns. The talk will mostly be showcasing working code backed by PPT only where needed. The code is available on github https://github.com/unmeshjoshi/distributedarchitectures which includes strip down versions of ZAB, Kafka and Cassandra. I will focus mostly on consensus part of the code.
Distributed Transactions which Scale
In a traditional monolithic application backed by a database, you are guaranteed to get ACID properties through localised database transactions. However in a complex microservice architecture, numerous services take part in solving a particular business problem. These services are loosely coupled and backed by their own databases. Maintaining application consistency across these services is a big challenge.In this talk, I propose an alternate solution implemented at Gojek for maintaining application consistency across services which scales. We discuss The Saga pattern, its fundamentals and how it is applied at Gojek. We go through real world cases of its successful application and implementation in production with code samples in Go.
FMEAs of the Distributed Systems before going live on Production
Every time, a new system has to deploy to Production, we need to ask these questions:
- Was there enough validation as to how the system behaves in de-graded mode?
- What trade-offs did we settle for in choosing CAP?
This talk covers how to approach towards creating an FMEA(Failure Mode and Effect Analysis) report, understand known knowns and known unknowns to take an informed call before deploying to Production. Every organization has its own set of the runtime environment and identifying the behaviour under those conditions for debugging and avoiding operation challenges.
Multi-core systems are around for quite some time now. But lot of applications are not written to take full advantage of the parallalizm it provides to the applications, improving the performance. This talk will cover multiple advantages multi-core systems provide and will also point out the challenges in multi-core programming. This will also cover real-life examples of multi-core programming.
Algorithmic Fault Tolerance: A Road Traffic Management Case Study
Fault tolerance is not just a systems concern: there is an algorithmic aspect to it as well. This talk will explain some aspects of algorithmic fault tolerance using distributed AI traffic signals as a case study. Oneirix's AI traffic signal technology optimizes road traffic in real time using a fog-edge-cloud system architecture. A large part of this architecture is out weathering sun, rain and hail and hence fault tolerance is of utmost important. The experience of developing this system may be applied to other autonomous distributed systems as well.
Viman Nagar, Pune, Maharashtra