The purpose of the chapter is to present basic information about relational algebra as a theoretical language of operations.
Relational algebra is a theoretical language of operations the permits to create on the basis of one or two relations another relation without changing output data [1, 4, 6, 7, 9, 10, 12].
Both operands and product are relations, and thus outcomes of one operation can become output data for another operation. This gives the possibility to create nested expressions of relational algebra the same way as arithmetic expressions are built. This is called a property closure. That means that relations are covered by relational algebra the same as numbers by arithmetic operations .
Relational algebra is a language of sequential relations usage where all corteges, even taken from different relations, are processed by one command without loop organization. For commands of relational algebra there are several syntax options. Below we will apply generally used character expressions of these commands and present them in an informal form. More detailed information on this question an interested reader can find in the papers of Ullman, 1988.
There are several options of operations selection that are included into relational algebra. First Codd proposed to eight operators, but later were added some more. Five basic operations of relational algebra (fig.5.1.) are: selection, projection, cartesian product, union and set difference. They use most operations of data access that can be interesting for us. On the vase of five principal operations it is possible to deduce additional operations (fig. 5.1): join, intersection and division.
Fig. 5.1. Scheme of relational algebra operators functions representation
Access and projection operations are unary as they work with one relation. Other operations work with couples of relations and that is why they are called binary operations. In the given below definitions R and S – these are two relations that are determined on the attributes A = (a1, a2, …, an) and B = (b1, b2, …, bn) correspondingly. To illustrate the results of the operations execution we will use «LIBRARY» DB relations (appendix B).
|σпредикат(R)||Sample operation works with only one relation R and determines final relation that contains only corteges (row) of relation R that meet given task (predicate).|
Compose a list of librarians who have clock number that exceeds 80.
σClockNumber > 80 (Librarians)
Outcome relation is the relation Librarians, and predicate is an expression ClockNumber>80. Sample operation determines new relation that contains only those corteges of relation Librarians,, where ClockNumber attribute value exceeds 80 (table 5.1). More complicated predicates can be created by means of logic operators AND, OR or NOT.
|Патр 1, …, атр n (R)||Operation of projection works with only one relation R and determines new relation that includes vertical subset of R relation that is created by sample of indicated attributes values and deletion of row-duplicates products.|
Compose a list of librarians that contains clock number value (ClockNumber), familyname (FamilyNamе), name (Name), patronymic (Patronymic) and home telephone (HomePhone).
ПClockNumber, FamilyName, Name, Patronymic, HomePhone (R)
In this example, application operations of projection determines new relation that will contain only attributes ClockNumber, FamilyNamе, Name, Patronymic and HomePhone of relation Librarians, put in indicated order (table 5.2).
|R × S||Operation of Cartesian product determines new relation that is an outcome of concatenation (that is chaining) of each cortege of relation R with each cortege of relation S.|
Sample and projection operators carry out data mining from one relation only. Can arise the situation when some combination of data from several relations is necessary. Operator of Cartesian product multiplies two relations. As a result we get new relation that contains all possible cortege couples from both relations. Thus if one relation has I corteges and N attributes and another one J corteges and M attributes, the relation with its Cartesian product will contain (I × J) corteges and (N + M) attributes. Outcome relations can have attributes with the same names. In such case attributes names will contain names of relations as prefixes. This guarantees uniqueness of attribute names in the result relation.
Compose a list of all readers who have ever taken books in the library using the following attributes of relation Readers: Code, FamilyName, Name and relation BookGiveOutRecord: Code, ReaderCode, InventoryCode.
(ПCode, FamilyName, Name (Readers)) × (ПCode, ReaderCode, InventoryCode(BookGiveOutRecord))
The product of carrying out such operation is a relation that contain 120 corteges and 6 attributes. There is no sense to illustrate its complete version. Look at first 12 corteges (table 5.3.). First 10 represent all possible combinations of the first cortege of relation Readers with ten corteges of relation BookGiveOutRecord. Two last positions are presented to compare their values with first two rows. These are two first possible combinations of the second cortege of relation Readers with first two corteges of relation BookGiveOutRecord. Next one will be third possible combination of the second cortege Readers with third cortege of relation BookGiveOutRecord. And then up till n-th cortege.
In such form (120 corteges) this relation contains more information than necessary. For example first cortege of this relation contains different values of attributesReaders.Code and ReaderCode. To get necessary list (table 5.4) we should make a sample of corteges that meet equality Readers.Code = ReaderCode. Completely this operation looks asfollowing:
σReaders.Code = ReaderCode (ПCode, FamilyName, Name(Readers)) × (ПCode, ReaderCode, InventoryCode(BookGiveOutRecord))
As we are going to see later, combination of Cartesian product and sample can be reduced to one operation - uniting.
|R ∪ S||Uniting of relations R and S with corteges I and J correspondingly we cam get after their concatenation with creation of one relation with maximal quantity of corteges (I + J) if corteges-duplicates are deleted. In such case relations R and S must be compatible to unite.|
Uniting of relations is possible only in case if their schemes concur, that is if they have equal quantity of attributes with domains (types of data) that concur. Such relations are compatible to unite. We should mention that in some cases to get two compatible relations the projection operation can be used.
Compose the list of familynames of all people that are mentioned in the DB.
ПFamilyName(Readers) U ПFamilyName(Librarians)
To create compatible relations first of all it is necessary to apply projection operation so that it is possible to delete from relations Readers and Librarians columns with attributes FamilyName, including duplicates if necessary. After that to combine received intermediate relations operation of uniting should be used (Table 5.5).
|R - S||Difference between relations R and S is composed of corteges that are present in relation R, but are absent in relation S. At the same time relations R and S must be compatible for uniting.|
Determine individual codes of readers who have never taken books it the library.
ПCode(Readers) – ПReaderCode(BookGiveOutRecord)
In this case, the same as for previous one, it is necessary to create integrated relation Readers and BookGiveOutRecord. Their projection must be executed according to the attributes Code and ReaderCode. Then we have to apply the operation of difference (Table 5.6).
Usually users are interested only in some part of all combinations of Cartesian product corteges that meets determined requirements. That is why instead of Cartesian product only one of the most important operations of relational algebra is applied – operation of integration. The result of its running on the base of two initial relations new relation is created. Operation of integration is derivative from Cartesian product operation. It is equal to Cartesian product sample operation of two relations with corteges that meet requirement determined in the predicate of integration. Predicate of integration is an equivalent. From position of effective implementation in relational DBMS this operation is one of the most complicated and is usually the main reason that causes efficiency problems that are typical for all relational systems.
Let’s study the following kinds of join:
|R►◄FS||Operation of Θ-join determines relation that contains corteges of Cartesian product of relations R and S, that meet predicate F. Predicate F is presented as R.aiΘS.bi, where instead of Θ can be used one of comparison operators (<, <=,>, >=, =, ≈).|
Symbol of theta join can be represented on the platform of basic operations of sample and Cartesian product:
R►◄FS = σF(R × S).
The same as for cartesian product, the degree of theta join is defined as sum of operand-relations R and S degrees. If predicate F is contained only in operators of equality (=), then this is an equi-join.
Compose the list of all readers that have ever taken books in the library (table 5.4).
In example that illustrates Cartesian product to receive this list Cartesian product and sample operations were applied. However the same result can be achieved by means of equi-join operation.
(ПCode, FamilyName, Name(Readers))►◄Readers.Code = ReaderCode(ПCode, ReaderCode, InventoryCode(BookGiveOutRecord))
|R►◄S||Natural joint is a joint implemented according to equivalence of two relations R and S, carried out according to all general attributes x. One sample of each shared attribute is excluded from the result of natural joint.|
The degree of natural joint is a sum of R and S operands-relations degrees minus quantity of x attributes. In the example of theta join equi-join was used to compose this list. There were present two attributes, Readers.Code and ReaderCode, that contain codes of library readers. If they would have the same name, for example ReaderCode, it might be possible to apply natural join operation to delete one of them.
Compose the list of all readers that have even taken books in the library (table 5.7).
(ПReaderCode, FamilyName, Name(Readers))►◄(ПCode, ReaderCode, InventoryCode(BookGiveOutRecord))
Usually, when two relations are joined, the cortege of one relation can not find corresponding cortege in another relation. In other words, distinct values can be found in the columns of join. It can be necessary that the row of one relation is represented in the join result even if there is no concurrent value. This can be achieved by outer join.
To indicate absent values in the result join NULL determinant is applied. The advantage of outer join is saving of output information: corteges that are lost when other types of join are implemented.
|R⊃◄S||Left outer join is a join where corteges of relation R that has no concurrent values in the columns shared with relation S are also included to result relation.|
To indicate code, family name, name, patronymic for all readers and information from BookGiveOutRecord for those who took books in the library (table 5.8).
(ПCode, FamilyName, Name, Patronymic(Readers))⊃◄(ПOutLibrarianCode, InventoryCode, IssueDate, ReturnDate, FactReturnDate, InLibrarianCode(BookGiveOutRecord))
Strictly speaking, in the example left (natural) outer join is illustrated, as all corteges of left relation are contained in the result relation. Also there is a right outer join that received its name because all corteges of right join are contained in the result relation. Apart from it there is a complete outer join, where the result join contains all corteges from both relations and NULL determinant is used to indicate distinct values of corteges.
|R►FS||Semi-join operation determines relation that contains those corteges of relation R that are included in R and S relations join.|
The advantage of semi-join is that it permits to reduce the quantity of corteges that must be processed for join. It is particularly useful for joins computing in the distributed systems. Semi-join operation can be defined by means of projection and join operators:
R►FS = ПA(R►◄FS)
Where A is a set of all attributes of relation R. In fact this is a half-theta-join and it is necessary to admit that there are semi-joins of equivalence and half-natural join.
Compose a report that includes complete information about all readers with name «Dmitry» who have ever taken books in the library (table 5.9).
Readers►Readers.Code = BookGiveOutRecord.ReaderCode AND Reader.Name = ‘Dmitry’BookGiveOutRecord
|4||Surenko||Dmitry||Pavlovich||543||6||NMU, geophysicst dep.||Senior professor||NULL|
|R ∩ S||Operation of intersection determines relation that contains corteges that are present in relation R the same as in relation S. Relations R and S must be compatible for join.|
Intersection can be noted down using operator of deference of sets: R ∩ S= R – (R – S).
Compose the list of librarians who are readers at the same time. Indicate their passport code, family name, name and patronymic.
(ПPasportCode, FamilyName, Name, Patronymic(Readers)) ∩ (ПPasportCode, FamilyName, Name, Patronymic(Librarians))
To use the terms of compatibility for join of relations that intersect the corresponding operations of projection were implemented.
Data that are set into relations «READERS» and «LIBRARIANS» (appendix B) prove that the result of their join is a null set that means that non of librarians is a reader.
Operator of division can be useful in case of special queries that can be carried pout rather often. Let’s consider that relation R is determined on the set of attributes A, and relation S – on the set of attributes B. At the same time B ⊆ A (that means that B is a subset of A). Let C =A - B, that means that C is a set of R relation attributes that is not attributes of relation S. Then definition of division operator can be presented as following:
|R ÷ S||The result of division operator is a set of R relation corteges defined according to the set of C attributes that meet the combination of all relation S corteges.|
This operator can be noted down using other basic operators:
T1 = ПC(R)
T2 = ПC((S × T1) – R)
T = T1 – T2
Create a list of names, surnames, patronymic names and passport codes of readers born after December 1, 1960.
R = ПCode, PassportCode, FamilyName, Name, Patronymic(Readers) (табл. 5.10);
S = ПCode(σBirthday > 31.12.1960(PasportData)) (table 5.11);
R ÷ S = (ПCode, PassportCode(Readers)) ÷ (ПCode(σBirthday > 31.12.1960(PasportData))) (table 5.12).
For the solution of the task, it is first necessary to obtain the relation R. To do this, a projection of the relation Readers is performed (table 5.10). Then we have to determine the relation S. For this, the projection of the relation PasportData is defined. In it, using the sampling operator, all passport codes with a date of birth after December 31, 1960 (table 5.11) were found. Now we can to obtain the result of dividing the relation R by the relation S (table 5.12).
Syntax of SQL-operators, that are applied for data processing in relational DBMS of different designers, can differ, but operations of relational algebra is a mathematical basement that units them. In case of necessity this fact can be useful to convert DB of relational DBMS that is applied for resolving current tasks of organization into the format of another designer.
© Yaroslav Kuvaiev, 2005—2021.
All rights reserved.
All information placed on this Web site is designed for internal use only. Its reproduction or distribution in any form is prohibited without the written permission from Author.