Pre-requisites: ESC101, CS210
Introduction: Database applications, purpose, accessing and modifying databases, need for transactions, architecture - users and administrators, data mining, information retrieval. iRelational Databases: relational model, database schema, keys, relational query languages, algebra, tuple and domain calculus example queries, (optional: equivalence of relational calculus and relational algebra).
SQL: Data definition, basic SQL query structure, set operations, nested subqueries, aggregation, null values, database modification, join expressions, views.
Database Design: E-R model, E-R diagram, reduction to relational schema, E-R design issues, database integrity, specifying integrity constraints in SQL: unique columns, foreign key, triggers.
Relational Database Design: features of good design, Functional Dependency theory, decomposition using functional dependency and normal forms, algorithms for decomposition, normal forms, (optional: multi-valued dependency and 4th normal form).
Storage and File structure: Overview of secondary storage, RAID and flash storage. Storing tables: row-wise, column database, database buffer. Indexing: concepts, clustered and non-clustered indices, B+-tree indices, multiple key access, hashed files, linear hash files, bitmap indices, Index definition in SQL, ++R-trees.
Query Processing: Overview, measures of query cost, selection, sorting, join processing algorithms-nested loops, merge-sort, hash join, aggregation.
Query Optimization: purpose, transformation of relational expressions, estimating cost and statistics of expression, choosing evaluation plans, linear and bushy plans, dynamic programming algorithms.
Transactions: Concept and purpose, ACID properties and their necessity, transactions in SQL. Problems with full isolation and levels of isolation.
Concurrency Control: lock-based protocols, 2-phase locking, deadlock handling, multiple granularity, timestamp based protocols, index locking, (optional: validation protocols, multi-version protocols, snap shot isolation, predicate locking, concurrency control for index structures).
Recovery: Failures and their classification, recovery and atomicity, recovery algorithms, Undo-Redo with write ahead logging, no Undo no Redo and other combinations, buffer management, (optional: ARIES recovery).
Optional/Advanced topics below covered at the discretion of instructor
Parallel Databases: Avenues for parallelism: I/O parallelism, interquery, inter-query and intra operation parallelism, databases for multi-core machines.
Distributed Databases: Distributed data storage, distributed transactions, commit protocols, concurrency control in distributed databases, heterogeneous and cloud-based databases.
Data Mining: Decision Support Systems, data warehousing, mining, classification, association rules, clustering
Information Retrieval: relevance ranking using terms and hyperlinks,page rank, indexing of documents, measuring retrieval effectiveness.
XML and semi-structured data: necessity, XML document schema, querying: XPath and XQuery languages, applications.