Distributed Computing

Bachelor 2019/2020

Type: Elective course (Software Engineering)

Area of studies: Software Engineering

Delivered by: School of Software Engineering

Where: Faculty of Computer Science

When: 3 year, 3, 4 module

Mode of studies: offline

Instructors: Petr Panfilov

Language: English

ECTS credits: 5

Contact hours: 64

Full Syllabus

Abstract

Distributed computing have become central concept of how computers are used, from web applications to e-commerce and to content distribution. Distributed computing help programmers aggregate the resources of many networked computers to construct highly available and scalable services. This course teaches the abstractions, design and implementation techniques that enable the building of fast, scalable, fault-tolerant distributed computing systems. A course will cover abstractions and implementation techniques for the construction of distributed computing systems, including client server computing, the web, cloud computing, peer-to-peer systems, and distributed storage systems. Topics will include remote procedure call, preventing and finding errors in distributed programs, maintaining consistency of distributed state, fault tolerance, and high availability. Also topics of multithreading, network programming, and several case studies of distributed computing systems will be considered.

Learning Objectives

To introduce students to the fundamental problems, concepts, and approaches in the design and analysis of distributed computing systems.
To familiarize students with the stages of the distributed system design cycle, including system architecture, data and processes arrangements, naming, communication and coordination issues, existing distributed computing paradigms, techniques, and tools, and evaluating the effectiveness of distributed application systems for specific data, task, and user types.

Expected Learning Outcomes

understand the evolution of the distributed computing from its early beginnings as multi-processor and multi-computer systems, to computer networks, to the emerging cloud, edge (fog, dew, mist) and heterogeneous computing environments
know the design goals of distributed computing systems
understand the distinction between distributed computing systems, distributed information systems and pervasive systems
know various types of distributed systems
understand the existing distributed computing paradigms and systematic issues
understand the distinction and relation between logical organization of the collection of software components and the actual physical realization of the distributed system
understand some commonly applied architectural styles toward organizing distributed computing systems
know the role of middleware layer in separating applications from underlying platforms
understand the practical issues and choices that can be made to instantiate and place software components on the real machines
understand the difference between centralized and decentralized architectures
explain and discuss basic principles and typical examples of real-world distributed systems such as NFS file-sharing system and the web
understand the concept of processes and how the different types of processes play a crucial role in distributed systems
understand threads and their role in obtaining performance in multicore and multiprocessor environments and in structuring clients and servers
know basic principles of virtualization for making applications to run concurrently and independently of the underlying hardware and platforms
understand client-server organizations in distributed systems
understand typical organizations of both clients and servers
know the design issues for servers including those used in object-based distributed systems
understand process migration or more specifically code migration and its role in achieving scalability of distributed system
understand the ways that processes on different machines in distributed system can exchange information
understand protocols or rules that communicating processes must adhere to
know the widely used models of communication: Remote Procedure Call (RPC), and Message-Oriented Middleware (MOM)
know basic principles of the RPC model and problems with achieving distribution transparency
understand the peculiarities of the high-level message-queuing model of process communication
know what an application-level routing means for the message-oriented communication
know how to set up multicast facilities for data dissemination in distributed systems
understand traditional deterministic means of multicasting as well as probabilistic approaches
understand the usage of names in resource sharing, identifying entities, referring to locations, and other uses in distributed systems
understand the difference in implementing naming system in distributed systems and nondistributed systems
know what a flat-naming system is, and what mechanisms are needed to trace the location of entities in distributed system
know naming approaches ranging from chains of forwarding links, to distributed hash tables, to hierarchical location services
understand general principles and scalability issues of structured name systems
know the use of Domain Name System (DNS)
know the way of using attributes assigned to an entity to resolve a description of an entity in distributed system
understand the importance of cooperation and synchronization of actions between processes
understand difference between process synchronization and data synchronization
understand the goal of process coordination, coordination problems and solutions in distributed systems
know basic principles of process synchronization based on actual time
understand coordination of a group of processes by means of election algorithms
know election algorithms for coordinating mutual exclusion to a shared resource
discuss the use of publish-subscribe systems for coordination in distributed event matching
understand an importance of the replication of data in distributed systems
know consistency models for shared data and their implementation
discuss and explain difference between data-centric and client-centric consistency models
know basic principles and key issues of actual implementation of consistency models
understand the issue of managing replica servers
know the alternatives for implementing strong consistency for replicas
understand how caching protocols can be used as a special case of consistency protocols
explain caching and replication in Web-based systems
understand the notion of partial failure of the distributed system and issue of recovery from partial failures
understand the process resilience through process groups
know the Paxos algorithm for reaching consensus among the group members
understand relation between fault tolerance and reliable communication
know basic principles of recovery from a failure in distributed systems
understand various mechanisms that are generally incorporated in distributed systems to support security
know about the security policy that is to be reinforced and design issues for mechanisms that help enforce such polices
know how to ensure secure communication between users or processes, possible residing on different machines
know how to ensure secure access control through authorization mechanisms
know basics of the security management including mechanisms to distribute cryptographic keys, add and remove users from a system, prove ownership to access specified resources, etc..

Course Contents

Introduction: Design goals
Distributed systems consisit of autonomous computers that work together to givr appearance of a single coherent system. Design goals for distributed systems include sharing resources and ensuring openness. In addition designers aim at hiding many of the intricacies related to distribution of processes, data and control.
Introduction: Types of systems
Different types of distributed systems exist which can be classified as being oriented towards supporting computations, information processing and pervasiveness. Distributed computing systems are typically deployed for high-performance applications often originating from parallel computing. Cloud computing goes beyond high-performance computing and also supports distributed systems found in traditional office environments. An emerging class of distributed systems is represented by pervasive computing environments, including mobile-computing systems as well as sensor-reach environments.
Architectures: Architectural styles. Middleware
We can make a distinction between software architecture and system architecture. AN architectural style reflects the basic principle that is followed in organizing the interaction between the software components comprising a distributed system. Important styles include layering, object-based styles, resource-based styles, and styles in which handling events are prominent.
Architectures: System architecture. Example
There are many different organizations of distributed systems. Client-server architecturesare often highly centralized. In peer-to-peer systems, the processes are organized into an overlaynetwrok, which is a logical network that can be structured using deterministic schemes for routing messages between processes, or unstructured. In hybrid architectures, elements from centralized and decentralized organizations are combined, as is the case in BitTorrent-based systems.
Processes: Threads. Virtualization
Processes play a fundamental role in distributed systems as they form a basis for communication between different machines. Threads in distributed systems are particularly useful to contonue using the CPU when a blocking I/O operation is performed. In general, threads are preferred over the use of processeswhen performance is at stake.Virtualization has since long been an important field of computer science. Popular virtualization schemes allow users to run a suite of applications on top of their favourite operating system and configer complee virtual distributed system in the cloud.
Processes: Clients. Servers
Organizing a distributed application in terms of clients and servers has proven to be useful. Client processes generally implement user interfaces, which may range from very simple displays to advanced interfaces. Client software is furthermore aimed at achieving distribution transparency by hiding details concerning the communication with servers. Servers are often more intricate than clients. They can either be iterative or concurrent, implement one or more services, and can be stateless or stateful.
Communication: Foundations. RPC
Communication between processes is essential for any distributed system. In traditional network applications, communication is often based on the low-level message-passing primitives offered by the transport layer. One of the most widely used abstractions is the Remote Procedure Call (RPC), that offers synchronous communication facilities, by which a client is blocked until the server has sent a reply.
Communication: Message-oriented & Multicast communication
Message-oriented middleware models generally offer persistent asynchronous communication, and are used where RPCs are not approapriate. An important class of communication protocols in distributed systems is multicasting.
Naming: Names, IDs. Flat naming
Names are used to refer to entities. There are three types of names: an address, an identifier, and human-friendly names. Given these types, we make a distinction between flat naming, structured naming, and attribute-basednaming. Systems for flat naming essentially need to resolve an identifier to the address of its associated entity. This can be done in different ways.
Naming: Structured naming. Attribute-based naming
Structured names are easily organized in a name space that can be represented by a naming graph in which a node represents a named entity and the label on an edge represents the name of the entity. Naming graphs are convenient to organize human-friendly names in a structured way. More problematic are attribute-based naming schemes in which entities are described by a collection of (attribute, value) pairs.
Coordination: Clock synchronization
There are various ways to synchronize clocks in a distributed system. All methods are based on exchanging clock values, while taking into account the time it takes to send and receive messages.
Coordination: Mutual exclusion. Election algorithms
An important class of synchronization algorithms is that of distributed mutual exclusion. These algorithms ensure that in a distributed collection of processes, at most one process at a time has access to a shared resource. Synchronization between processes often requires that one process acts as a coordinator. To decide on who is going to be that coordinator an election algorithm is applied.
Consistency and replication: Data-centric & Client-centric models
Replicating data is used for improving the reliability of a distributed system and for improving preformance. Replication introduces a consistency problem: whenever a replica is updated, that replica becomes different from the others. To keep replicas consistent we need to propagate updates in such a way that temporary inconsistencies are not noticed. There are different consistency models. Consistent ordering of operations has since long formed the basis for many consistency models. An opposed to data-centric models, researchers in the field of distributed databases for mobile users have defined a number of client-centric consistency models.
Consistency and replication: Replica management. Consistency protocols
Consistency protocols descibe specific implementations of consistency models. With respect to sequential consistency and its variants, a distinction can be made between primary-based protocols and replicated-write protocols. We pay separate attention to caching and replication in the Web and, related, content delivery networks.
Fault tolerance
Fault tolerance is defined as the characteristic by which distributed computing system can mask the occurence and recovery from failures. Several types of failures exist. Redundancy is the key technique needed to achieve fault tolerance. When applied to processes, the notion of process groups becomes important. The real problem is that members of a process group need to reach consensus in the presence of various failures. Paxos is by now a well-established and highly robust consensus algorithm.
Security
A distributed system schould provide the mechanisms that allow a variety of different security polices to bne reinforced. Three important issues can be distinguished: secure channels between processes, access control or authorization, and management. Also a special attention is required to handling secure names.

Assessment Elements

InClass Activity
Homeworks
Referate (Individual Study)
Study material is based on analysis of one-two recent papers on topic
Home Assignment (Group Project)
Final Examination
Экзамен письменный в MS Teams. Без прокторинга. Технические требования: web-камера, микрофон, наушники / колонки

Interim Assessment

Interim assessment (4 module)
0.2 * Final Examination + 0.3 * Home Assignment (Group Project) + 0.2 * Homeworks + 0.1 * InClass Activity + 0.2 * Referate (Individual Study)

Bibliography

Recommended Core Bibliography

Distributed Systems. (2017). Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsnar&AN=edsnar.oai.ris.utwente.nl.publications.db6a761f.b353.419e.b65a.81e3740bbe53
Tanenbaum, A. S., & Steen, M. van. (2014). Distributed Systems: Pearson New International Edition : Principles and Paradigms (Vol. 2nd ed). Harlow, Essex: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1418515

Recommended Additional Bibliography

Steen, M., & Tanenbaum, A. (2016). A brief introduction to distributed systems. Computing, 98(10), 967–1009. https://doi.org/10.1007/s00607-016-0508-7

Course Syllabus