Database systems
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr
Advantages
- Data independence
- Efficient data access
- Data integrity
- Data security
- Data administration capabilities
- Resilient concurrent access
- Crash recovery
- Reduced application development time
Problems and Limitations
- Data Management approach poses problems in data resource management
- Developing a large database and installing a DBMS can be difficult and expensive
- More hardware capability is required (storage and memory requirements are higher)
- Longer processing times may result from high-volume transaction processing applications
- The security and integrity of an organization's databases are major concerns of an organization's data resource management effort
Levels of Abstraction
- Physical schema
- Defines how data is stored
- Conceptual schema or logical schema
- Defines data in terms of a data model
- External schema or view level
- Defines a number of simplified domain-specific views
DBMS Levels of Abstraction
Data Independence
The three different levels of abstraction allow us to:
- Change the file organization without affecting the conceptual schema
- Change the conceptual schema without affecting an application (by adjusting the external schema)
Database Design
- Requirements analysis
- Conceptual database design: develop high level description
- Logical database design: map description into the specific DBMS
- Schema refinement: identify and solve potential problems
- Physical database design: optimize for specific workloads
- Security design
Entities and Attributes
- Entity
- Object in the real world
- Attributes
- Describe each entity
- Entity set
- Group of similar entities (share same attributes)
- Attribute domain
- Values that can be used for the attribute
- Key
- Minimal set of attributes that uniquely identify an entity
- Primary key
- Designated among a number of candidate keys
The books entity set
The Entity Relationship Model
- Relationship
- Association between two entities
- Relationship set
- Group of similar relationships
- Descriptive attributes
- can identify a relationsip
A relationship can be identified as:
- one-to-many
- many-to-many
- one-to-one
The Published relationship
The Relational Model
Relation Schema
- Name of the relation
- Name of each field (or column or attribute)
- Domain of each field
Relation Instance
- Set of tuples (or records)
(Table of rows with the same number of fields)
ISBN | Title | Price |
0-387-02620-7 | Beyond Fear | 29.00 |
0-201-79940-5 | Code Reading | 49.95 |
0-07-246535-2 | Database Management Systems | 40.00 |
0-19-502402-8 | The Timeless Way of Building | 60.00 |
Redundancy
Storing redundant information in a database results in:
- Waste of space
- Data that may not be correctly updated
- Data that can not be inserted unless other data is also inserted
- Data that is incidentally lost when other data is removed
Example
ISBN | Title | Price | Discount | Reduced Price |
0-387-02620-7 | Beyond Fear | 30.00 | 10 | 27.00 |
0-201-79940-5 | Code Reading | 50.00 | 20 | 40.00 |
0-07-246535-2 | Database Management Systems | 40.00 | 20 | 38.00 |
0-19-502402-8 | The Timeless Way of Building | 60.00 | 10 | 54.00 |
Dependencies are resolved by decomposing the relations.
The Structured Query Language
Language used for
- Creating
- Manipulating
- Querying
relational DBMSs.
Used:
- Directly by database administrators and expert users
- As a standard way for applications to communicate with the DBMS
- As a way to backup, restore, and transfer data in a portable form
SQL Table Definition and Data Manipulation
Create the Table
CREATE TABLE Books (
isbn CHAR(13),
title CHAR(100),
price real)
Add a Record
INSERT
INTO Books (isbn, title, price)
VALUES ("0-201-79940-5", "Code Reading", 49.95)
Modify Records
UPDATE Books B
SET B.price = B.price * 1.1
Delete a Record
DELETE
FROM Books B
WHERE ISBN="0-201-79940-5"
SQL Queries
Display the Details for a Record
SELECT *
FROM Books
WHERE ISBN="0-201-79940-5"
Display Records Based on a Condition
SELECT title, price
FROM Books
WHERE price > 50 AND price > 10
Perform an Aggregate Calculation
SELECT AVG(price)
FROM Books
Join Two Tables
SELECT title, authorname
FROM Books, Authors
WHERE Books.authorid = Authors.authorid
Query By Example
SELECT BOOKS.title, BOOKS.price, BOOKS.price, BOOKS.isbn
FROM BOOKS
WHERE (((BOOKS.price)>100)) OR (((BOOKS.price)<10))
ORDER BY BOOKS.title;
Additional SQL Query Constructs
SQL provides many sophisticated capabilities:
- Selection of distinct records
SELECT DISTINCT title, price
FROM Books
WHERE price < 50 AND price > 10
- Aggregate calculations (COUNT, SUM, AVG, MIN, MAX)
SELECT MIN(price), MAX(price)
FROM Books
- Operations on groups
SELECT discount, AVG(price)
FROM Books
WHERE price > 10
GROUP BY discount
HAVING MAX(price) - MIN(price) > 10
- String pattern matching
SELECT title, price
FROM Books
WHERE title LIKE 'TIMELESS%'
- Ordering
SELECT title, price
FROM Books
ORDER BY price
- Set union, intersection and difference (UNION, INTERSECT, EXCEPT)
SELECT title, price
FROM Books
UNION
SELECT title, price
FROM magazines
- Nested queries (IN, EXISTS, operator ANY, operator ALL)
SELECT title, price
FROM Books
WHERE price > (SELECT AVG(price) FROM Books)
SELECT title, price
FROM Books
WHERE price < ALL
(SELECT price
FROM Books
WHERE title LIKE 'Harry Potter%')
Views
By defining appropriate views on a schema we
- Achieve logical data independance
- Can better organise security
CREATE VIEW BrowseBooks(isbn, title)
AS SELECT B.isbn, B.title
FROM Books B
WHERE B.price < 100
Indexes
An index is an auxilliary data structure used
to speed up operations that are not efficiently supported by
a table's record organisation.
CREATE INDEX IndPrice ON Books
WITH STRUCTURE = BTREE
KEY = (price)
To design the index structure take into account:
- The query workload
- The update (insert, delete, update) workload
- Operations that will benefit from the index
- Attributes that are often used
- The type of the index structure (hash versus tree)
- Index maintenance cost
Security Issues
Our goal in designing a secure database is to achieve:
- Confidentiality
- Integrity
- Availability
Discretionary access control provides us the capability
to give (and revoke) rights to specific users or groups.
Examples
GRANT SELECT
ON BrowseBooks
TO WebUsers
REVOKE INSERT, DELETE
ON Books
From Alice
GRANT INSERT, DELETE
ON Books
TO InventoryGroup
GRANT UPDATE(price)
ON Books
TO MarketingGroup
GRANT UPDATE(title, isbn)
ON Books
TO MaintenanceGroup
Transaction Management
- A transaction is a single unit of execution program execution (e.g. book a seat)
- Transactions provide concurrency control
- Transactions support crash recovery
Four important properties (ACID):
- Atomic
- Database ensures that all actions are carried out, or none
- Consistency
- Users ensure that transactions leave the data in a consistent state
- Isolation
- Users to not have to worry about concurrently executing transactions
- Durability
- Completed transactions persist after a crash even if the database has not been updated
Crash Recovery
- Crash recovery mechanisms guard against
- System crashes
- Media failures
- A log or journal records all changes before they modify the database
- The log is assumed to survive system crashes and media failures
- After a crash the recovery manager follows the ARIES stretegy
- Analysis
- of changes that have not been written and active transactions
- Redo
- actions that were not written to the database
- Undo
- transactions that were not completed
- The log is also updated during the recovery to guard against repeated crashes
- A checkpoint is periodically recorded to the log to reduce the recovery overhead
- A checkpointing strategy is also used to take backup copies of live databases
The Database Administrator
Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise's information resources and needs.
The database administrator's duties include:
- Schema definition
- Storage structure and access method definition
- Schema and physical organization modification
- Granting user authority to access the database
- Specifying integrity constraints
- Acting as liaison with users
- Monitoring performance and responding to changes in requirements
User Profiles
Users are differentiated by the way they expect to interact with the system
- Application programmers
- interact with system through DML calls
- Sophisticated users
- form requests in a database query language
- Specialized users
- write specialized database applications that do not fit into the traditional data processing framework
- Naive users
- invoke one of the permanent application programs that have been written previously
RAID Storage
Redundant arrays of independent disks (RAID) are used to overcome
two bottlenecks associated with disk storage:
By using additional redundant disks we can increase both.
The following are some typically identified RAID levels:
- 0: Nonredundant
- Data is distributed across different disks to increase performance
- 1: Mirrored
- A second disk set keeps a copy of the data (50% overhead)
- 0+1 (or 10): Striping and mirroring
- Data is distributed across a second disk set to increase performance
- 2: Error correcting codes
- Additional check disks (4:3, 10:4, 25:5) are used to provide redundancy
with a smaller overhead (e.g. 57%, 71%, 83%)
- 3: Bit-interleaved parity
- A single additional check disk is used to recover the data by distributing bits across all disks
- 4: block-interleaved parity
- The check disk contains the parity on a block level: higher read throughput
- 5: block-interleaved distributed parity
- Block parity is distributed across all disks: higher read and write throughput
- 6: higher redundancy
- Like 5 with an additional check disk guarding against a second failure
XML Concepts
- Semantic and structural text markup language
- Does not (directly) define presentation
- Application of SGML derived from HTML
- Extensible
- A separate document type definition is used to specify the structure of the data
- Delivers cross-platform data portability
- Domain-specific applications allow data interchange
XML Syntax
An XML document is structured on the following building blocks:
- Elements
- Delimited by starting and ending tags
Example:
<title>The Lord of the Rings</title>
Elements can also be empty; a special shortcut makes them smaller:
Example:
<dvd></dvd>
is the same as
<dvd />
- Attributes
- Name=valua pairs in the starting tag, that can be used for metadata
Example:
<title lang="english">The Lord of the Rings</title>
- Entity references
- Are used to refer to special (reserved) symbols
Symbol | Entity reference |
< | < |
> | > |
& | & |
" | " |
' | ' |
- Comments
- Delimited by the <!-- and -> sequence
Example:
<!-- this is a comment ->
Example
<!-- List of shipped books -->
<booklist>
<book>
<isbn>0-387-02620-7</isbn>
<title>Beyond Fear</title>
<author>
<givenname>Bruce</givenname>
<familyname>Schneier</familyname>
</author>
<price>30.00</price>
</book>
<book>
<isbn>0-201-79940-5</isbn>
<title>Code Reading</title>
<author>
<givenname>Diomidis</givenname>
<familyname>Spinellis</familyname>
</author>
<price>50.00</price>
</book>
<book>
<isbn>0-07-246535-2</isbn>
<title>Database Management Systems</title>
<author>
<givenname>Raghu</givenname>
<familyname>Ramakrishnan</familyname>
</author>
<author>
<givenname>Johannes</givenname>
<familyname>Gehrke</familyname>
</author>
<price>40.00</price>
</book>
<book>
<isbn>0-19-502402-8</isbn>
<title>The Timeless Way of Building</title>
<author>
<givenname>Cristopher</givenname>
<familyname>Alexander</familyname>
</author>
<price>60.00</price>
</book>
<book>
<isbn>3-8218-0479-3</isbn>
<title lang="german">Lexikon der Populaeren Irrtuemer</title>
<author>
<givenname>Walter</givenname>
<familyname>Kraemer</familyname>
</author>
<author>
<givenname>Goetz</givenname>
<familyname>Trenkler</familyname>
</author>
<price>40.00</price>
</book>
</booklist>
Document Type Definitions
- A document type definition is used to describe the format of valid XML documents
- It can be used to verify XML documents
- It can be used to parse XML documents decomposing them into a tree
- Defines allowed elements
- For each element defines content and supported attributes
- Specified element content can be
- Parsed character data
- Indicated by #PCDATA
- Child element
- Indicated (name)
- Sequence of elements
- Indicated by (child1, child2, ...)
- Zero or one child elements
- Indicated by name?
- Zero or more child elements
- Indicated by name*
- One or more child elements
- Indicated by name+
- A choice among different child elements
- Indicated by (name1 | name2 | ...)
Example:
<!ELEMENT booklist (book)*>
<!ELEMENT book (isbn,title,author+,price,edition?)*>
<!ELEMENT isbn (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ATTLIST title lang (german|english) "german">
<!ELEMENT author (givenname,familyname)>
<!ELEMENT givenname (#PCDATA)*>
<!ELEMENT familyname (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT edition (#PCDATA)>
Bibliography
- D. D. Chamberlin,
M. M. Astrahan, K. P. Eswaran, P. P. Griffiths, R. A. Lorie, J. W. Mehl,
P. Reisner, and B. W. Wade.
SEQUEL 2: A unified approach to data definition manipulation and control.
IBM Journal of Research and Development, 20(6):560-575, November
1976.
- Peter Pin-Shan Chen.
The entity-relationship model --- toward a unified view of data.
ACM Transactions on Database Systems, 1(1):9-36, March 1976.
- Elliotte Rusty Harold
and W. Scott Means.
XML
in a Nutshell.
O'Reilly and Associates, Sebastopol, CA, 2001.
- International Organization
for Standardization, Geneva, Switzerland.
Information technology --- Database languages --- SQL, 1992.
ISO/IEC 9075:1992.
- Henry F. Korth
and Abraham Silberschatz.
Database System Concepts.
McGraw-Hill, second edition, 1991.
- Raghu
Ramakrishnan and Johannes Gehrke.
Database Management Systems.
McGraw-Hill, second edition, 2000.