• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Bachelor 2020/2021

Databases

Category 'Best Course for Career Development'
Type: Compulsory course (Software Engineering)
Area of studies: Software Engineering
When: 3 year, 1, 2 module
Mode of studies: offline
Instructors: Alexander Breyman
Language: English
ECTS credits: 6
Contact hours: 60

Course Syllabus

Abstract

The course attendants should develop skills and understanding in: the design methodology for databases and verifying their structural correctness; implementing databases and applications software in the relational model as well as in map/reduce paradigm; using querying languages, primarily SQL, and other database supporting software; applying the theory behind various database models and query languages; implementing security and integrity policies relating to databases; working in group settings to design and implement database projects.
Learning Objectives

Learning Objectives

  • Can design conceptual, logical and physical relational data models.
  • Can compose queries to relational databases using relational algebra, tuple relational calculus and SQL.
  • Apply methods of database design, including entity-relationship approach and normalization-based approach.
  • Knows and is able to apply database application design and development methods usable for object-oriented program systems, including object-relational mappers, its advantages and disadvantages.
  • May reason on database systems performance based on knowledge models and methods of internal organization of relational databases including file storage, indexing, query processing and transaction management issues.
Expected Learning Outcomes

Expected Learning Outcomes

  • Have an idea of Databases as discipline
  • Know the history of data management approaches
  • Understand the basic concepts of database systems.
  • Know 3-level X3/SPARC database architecture.
  • Understand notion of data independence
  • Know applicability of inverted index for data access
  • Know applicability of early data models: hierarchical and network.
  • Know basic relational model concepts
  • Know distinct features and applicability of object-oriented data model
  • Know distinct features and applicability of object-relational data model
  • Able to build conceptual data models
  • Know method of entity-relationship modeling
  • Know how to use UML class diagrams to represent conceptual data models
  • Understand relational data model concepts and principles
  • Know relational algebra operators under set semantics
  • Able to express queries using relational algebra
  • Understand relational integrity constraints
  • Understand functional dependencies and their relationship to keys
  • Understand 1NF, 2NF, BCNF and 3NF
  • Able to find keys given a set of functional dependencies
  • Able to identify tables that are not normalized
  • Able to decompose tables into BCNF and/or 3NF compliant tables.
  • Understand the difference between different types of query languages
  • Understand set semantics and bag semantics
  • Understand relational algebra operators including extended operators under both set and bag semantics
  • Able to write single table SQL queries.
  • Able to compose SQL queries using set (and bag) operators.
  • Able to write SQL queries with correlated subqueries.
  • Able to write SQL aggregation queries involving GROUP BY and HAVING clauses.
  • Able to translate query from English to SQL
  • Able to write SQL statements to change (insert/update/delete) data in tables
  • Understand views creation and usage in SQL
  • Understand triggers creation and usage in SQL
  • Understand difference between SQL and procedural SQL-based languages
  • Understand what indexes are for and how to use them.
  • Understand client-server computing in the context of DBMSs
  • Understand how a host programming language environment interacts with the SQL environment
  • Able to write Java programs that interacts with a DBMS server via the JDBC API.
  • Understand storage media and their basic properties
  • Understand how data is stored using storage media in a DBMS
  • Understand how different indexing techniques work
  • Understand why and how data needs to be indexed
  • Can analyze the performance of different storage and indexing strategies
  • Understand how queries are processed, optimized and evaluated in a DBMS.
  • Understand what is a query execution plan
  • Understand how query execution plans are transformed/re-written.
  • Understand how the cost of a query execution plan is estimated.
  • Understand transactions and their properties (ACID)
  • Understand the anomalies that occur without ACID
  • Understand the locking protocols used to ensure Isolation
  • Understand the logging techniques used to ensure Atomicity and Durability
  • Understand Recovery techniques used to recover from crashes
  • Know advantages and applicability of distributed database systems
  • Know software components and functions of distributed DBMS.
  • Understand distributed query processing.
Course Contents

Course Contents

  • Introduction into data management
    History of data management approaches. Basic database system concepts. Database environment. Database users. Database development process. Database planning.
  • Data Modeling
    Three level database architecture. Data model. Data independence. Inverted files. Early data models: hierarchical and network. Basic relational model concepts. Object-oriented model. Object-relational model. Semi-structured model. Semantic data models.
  • Database Design: E/R and UML Approaches
    Entity-relationship model. Entities and attributes. Entity types. Keys. E/R diagram. Relationships. Attributes and roles. Relationship type, degree and cardinality. Relationship participation constraints. Strong and weak entities. Entity type hierarchies. Specialization and generalization. Total and partial unions. Unified modeling language class diagram. Association and aggregation in UML. Generalization hierarchies in UML. Multiplicity indicators in UML.
  • Relational Model
    History of relational model. Advantages of relational model. Basic relational data structures. Mathematical and database relations. Relation schema. Relational database. Integrity constraints. Relation keys. Primary key constraint. Foreign key constraint.
  • Relational Database Design
    Objectives of normalization. Limitations of E/R design. Redundancy. Anomalies: insertion, deletion, update. Decomposition. Informal guidelines for relation design. Functional dependencies. Axioms of functional dependencies. Closure. Minimal cover of a set of dependencies. Desirable properties of decompositions: attributes preservation, dependency preservation, lossless join. First normal form. Full functional dependencies. Second normal form. Transitive dependency. Third normal form Boyce/Codd normal form (BCNF). Multivalued dependencies. Fourth normal form. Fifth normal form. Domain/Key normal form (DKNF). BCNF decomposition algorithm and its properties. Normalization drawbacks. Denormalization.
  • Relational Query Languages
    Relational algebra. Relational algebra operations. Selection. Projection. Set operations. Renaming. Join, equijoin, antijoin, theta-join. Natural join. Division. Tuple relational calculus: atoms, formulas, queries. Domain relational calculus.
  • SQL: Structured Query Language
    SQL language history. SQL standards. Data definition and data manipulation (sub)languages. SQL data types. Table declaration. Primary keys, unique constraints, default values, nullable attributes. Check constraints. Foreign key constraints. Handling foreign key violations. Indexes. Database schema modifications. SQL query sublanguage: SELECT. Single-table queries. Filtering conditions. Logical operations IN, ALL, EXISTS. Join queries. Join types: cross, natural, inner, outer, self. Duplicates elimination. Set operations. Nested queries. Correlated nested queries. Aggregate functions. Grouping and group filtering. Query result sorting. INSERT. UPDATE . DELETE. Views: creation, use and updating. Triggers: creation, activation, execution. Multiple triggers. View materialization. SQL procedural extensions and dialects: T-SQL, PL/SQL, PgSQL. Stored procedures.
  • Application Design and Development
    Database access from programming languages. ODBC architecture. ODBC Drivers. ODBC connection strings. JDBC architecture. Connecting to DBMS. Preparing and executing queries. Using result sets and cursors. Handling exceptions. Transactions in JDBC. Object-relational mapping. Design patterns for data persistence. Active Record pattern. Data Mapper pattern. Hybernate.
  • Storage and File Structure
    Memory hierarchy. Disk storage: physical disk structure, pages and blocks I/O time: seek latency, transfer time. Disk cache. RAID. SSD. SAN and NAS. Datafiles: blocks and extents. Block structure. Fixed and variable record formats. Large objects (LOBs).
  • Indexing and Hashing
    Indexing concepts. B-tree index. B-tree insertions and deletions. Hash index. Hash functions. Extendible hashing. Bitmap index. Join index. GiST.
  • Query processing
    Query processing overview. Relational algebra translation. Query tree. Relational algebra equivalences. Heuristics for optimization. Cost-based optimization. Cost factors and estimation. External merge sort. Duplicate elimination. Implementing set operations. Sort-based and hash-based projection. Computing selection without indexes. Computing selection with clustered index. Computing selection with b-tree index. Computing selection with hash index. Computing joins: nested loops. Computing joins: block nested loops. Computing joins: sort-merge join. Computing joins: hash-join.
  • Transaction management
    Single-user and multi-user systems. Basic concepts of transactions. ACID properties. Isolation. Serial and interleaved execution. Schedules: serial, serializable. Methods to ensure serializability. Concurrency control. Optimistic and pessimistic concurrency. Two-phase locking. Locking and deadlocks. Implementing isolation levels with locks. Snapshot isolation. Timestamping. Atomicity and Durability. Write-ahead log. Redo and undo records. Recovery from crash. Checkpoints. Distributed transactions. Two-phase commit protocol. Replication.
  • Distributed and parallel databases
    Distributed database systems: advantages and drawbacks. Distributed database architectures. Software components and functions of distributed DBMS. Data placement. Transaction management for distributed DBMS. Locking protocols. Timestamping protocols. Commit protocols. Distributed recovery from failures. Distributed query processing. Data integration. Parallel database systems
Assessment Elements

Assessment Elements

  • non-blocking midterm test
  • non-blocking essay
  • non-blocking quiz
  • non-blocking group project
    Students will work in groups of up to 5 students towards designing and development of a relational database. Topics might be suggested by lecturer as well as proposed by students.
  • Partially blocks (final) grade/grade calculation written exam
  • non-blocking Work on seminars
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.14 * essay + 0.28 * group project + 0.14 * midterm test + 0.14 * quiz + 0.3 * written exam
Bibliography

Bibliography

Recommended Core Bibliography

  • Foster, E. C., & Godbole, S. (2016). Database Systems : A Pragmatic Approach (Vol. Second edition). [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1174505
  • Petrov, A., & O’Reilly for Higher Education (Firm). (2019). Database Internals : A Deep Dive Into How Distributed Data Systems Work (Vol. First edition). Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2250514

Recommended Additional Bibliography

  • Celko, J. (2014). Joe Celko’s SQL for Smarties : Advanced SQL Programming (Vol. Fifth edition). Amsterdam: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=924899
  • Connolly, T. M., & Begg, C. (2015). Database Systems: A Practical Approach to Design, Implementation, and Management, Global Edition (Vol. Sixth edition, Global edition). Harlow: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1419568
  • Databases reference sheets: Date with Java, SQL, XML, JAXB, JDBC, JPA, JPA Inheritance / Rodrigo García Carmona. (2016). Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.991AD36F
  • Date, C. J. (2015). SQL and Relational Theory : How to Write Accurate SQL Code (Vol. Third edition). Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1099367
  • Date, C. J. (2016). The New Relational Database Dictionary : Terms, Concepts, and Examples (Vol. New and expanded edition). Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1136522
  • Haq, Q. M. R. U. (2016). Data Mapping for Data Warehouse Design. Amsterdam: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1115852
  • Jukic, N., Vrbsky, S., & Nestorov, S. (2017). Database Systems : Introduction to Databases and Data Warehouses. Burlington, Virginia: Prospect Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1562389
  • Mohanty, H., Bhuyan, P., & Chenthati, D. (2015). Big Data : A Primer. New Delhi: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1020194
  • Ottinger, J. B., Linwood, J., & Minter, D. (2016). Beginning Hibernate : For Hibernate 5 (Vol. Fourth edition). [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1360349
  • Rad, R. (2016). Mastering Hibernate. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1239416