Objective of the chapter is to define features and to study basic branches of database operations optimization that meet requirements of users and provide comfortable work.
Optimal operation of DB is a creation of such conditions for access and data manipulation that provide maximal operating speed with minimal consumption of resources. DB designers not often can influence such an important resource as server productivity. That is why stating that more powerful hardware component is always better, we can examine areas that can be influenced from the side of designers and DB administrators:
1) DB structure optimization;
2) queries optimization;
3) client’s application program optimization.
DB structure optimization is carried out within the following branches:
1) normalization of relations;
2) data access methods;
3) inner DBMS mechanisms adjustment.
Optimization of queries includes:
1) building DB structure that is adequate to queries;
2) optimization of DB tables indexes structure;
3) optimization of queries texts.
Optimization of client’s application programs we are going to examine only from the position of efficient DB access.
Normalization of relations makes an impact on data sampling operation speed [4 - 7, 10, 12]. Often normalization worsens operation speed. In its theoretical part, it leads to the clearest representation of universe of discourse entities. Eliminating data excessiveness it can significantly save footprint. From the practical point of view, and considering big volumes of data, optimization can significantly lower speed of operation. When the call is made for one, even very big, DB table, less time is spent in comparison with calls to several smaller tables with join operations.
Also high level of normalization causes DB relations quantity growth. As a result, DB structure is not sensed by designer and users as an entire one. Here we deal with «human factor». It can create serious errors in the DB structure on the stage of logical designing and that can lead to the most serious negative consequences.
That is why during the DB designing it is necessary to consider both negative and positive particularities of relations normalization. Usually one sacrifices extra disk memory for saving not completely normalized tables, trying to provide maximal operation speed that is rather topical for increasing quantity of users who share the system.
Even though from the theoretical part we know that data structure must not depend on access methods, in practice it is not true. For example, there are situations of providing data access and correction that take part in rather complicated computing processes, carried out by several users. In such situations sometimes it is impossible to implement the task without additional field that determines current status of this data collection. Its value permits to read, correct or look through data that are involved in computations when they are called by the group of users from different terminals. Thus designing DB logical structure it is impossible to abstract away from the method that is applied to process data on the server and in the client’s application program.
It is necessary to underline that there is a relation between DB queries texts, structure and tables indexes, that proves the connection between data structure and access methods.
For the rapid access to DB tables it must physically occupy continuous pages block. It is known that on extracting new pages in InterBase DBMS doesn’t make any attempt to extract adjacent pages to save the same table [14, 15]. That is why data that refer to one page can present information from several tables, being fragmentary. Saving plural generations of records also causes significant «contamination» of DB and lowers operation speed. With each modification of record in fact new version is generated. It is changed or added by transactions that were rejected but not deleted from DB. Besides, at deletion of records, the relocation of their versions that remained to remove «holes» that appeared on the DB pages is not realized.
To resolve mentioned problems, periodic generation of backup copy and DB update is carried out. In this case «rubbish», that is versions of records that will not be used any more, are gathered. «Holes» on the pages of DB that appeared after deletion of records are eliminated. Each table is placed in continuous pages block.
Size of DB page by itself is very important. Read/record to InterBase DBMS is carried out by pages. That is why record of tables if possible must be placed within one page. If the size of the page is too small for storing one record and it is placed on several pages, it is necessary to carry out several physical operations of reading. From the other side, size of the page must not be too large, as in such case useless records will be read off.
The size of input-output buffer can also influence operation speed. It is recommended to increase the size of input-output buffer for DB that is often called for reading operations. For DB that is often called for record operations it is recommended to decrease the size of input-output buffer.
Presence of index can significantly improve operation speed of some queries. However, taking into consideration that indexes must be updated by the system at each modification of the basic table, they create additional load on the system. Usually indexes are created to satisfy particular search conditions after the table already took part in processes and increased its size. They can be created for DB tables only, but not for presentations.
Index is a structure of data that permits to reduce access time for separate rows of the tables regardless their physical location.
It is similar to inverted file that is illustrated at the end of the book. This is a structure that is connected to file and is dedicated for information search the same mechanism as inverted file in the book. Index permits to avoid sequential or stepped file scanning during the necessary data search. At its application the volume of search can present one or several file records. The same as inverted file of the book, index is well-organized and each element contains the name of search object together with one or several record identifiers that refer to the place of its location.
In the InterBase concept of keys and indexes are separated on logic level [14, 15]. Primary (PRIMARY KEY) and foreign (FOREIGN KEY) keys are generated to provide referential integrity of relationally-joined tables. Apart from it, primary key provides the support of its values uniqueness that is stipulated by its basic designation – to define record in the DB table. For the same purpose it is possible to apply just unique key (UNIQUE). However it is used by operator SELECT to hasten the data access. «Ordinary» indexes unlike keys are dedicated for data access optimization only. From the physical point of view, objects that were logically divided into keys and indexes during the creation of the table, on the physical level were transferred into indexes. But if «ordinary» indexes can be assigned a name, physical indexes are implemented on the base of keys determination and are automatically built and named by system.
Most dialects, including InterBase, support the following operator that provides the possibility to generate indexes for DB tables:
CREATE [UNIQUE] [ASС [ENDING] | DESC [ENDING]] INDEX Name
ON Table name (attribute[, attribute ...]).
It is necessary to assign index name, table name and to list attribute names that are used to build index. It is necessary to remind that format of operator CREATE INDEX permits to eliminate replication of key attributes values, putting proposition UNIQUE. In such case candidate key values uniqueness will be automatically supported by the system. On default index field values sorting is carried out according to growth. It can be indicated evidently using proposition ASС[ENDING]. If it is necessary to provide reverse order of sorting – according to decreasing, then proposition DESC[ENDING] is used.
Indexes generating is possible at any moment, even for the table already filled with data. However there can be problems caused by data replication in different rows, if proposition UNIQUE is applied in operator. Thus it is worth to generate unique indexes directly at creating tables, before the information id entered. As a result system at once will take the control of corresponding attributes values uniqueness.
To delete index we can use operator
DROP INDEX Name;
However, this operator can not delete index, generated in operator CREATE TABLE (PRIMARY KEY, FOREIGN KEY, UNIQUE). For this purpose operator ALTER TABLE must be used. It is also impossible to delete index that is used and can take part in execution of queries to DB from other users. Also particular DB access privileges are necessary for deletion of index.
The efficiency of queries execution significantly depends on index structure [4, 6]. At queries execution DBMS firstly looks through the list of indexes that are determined for tables that. Then one of two query execution schemes is chosen – to apply present indexes or sequentially look through tables. Optimizer DBMS seeks to carry out query with maximal operation speed and minimal overhead expenses. Before the first call DBMS always optimizes query execution, based on current state of DB. Repeat queries, where only attribute values are changed, are not optimized. Only preliminary binding of formal and actual parameters is implemented, after that query is executed.
It is impossible to foreknow what scheme of DBMS optimizer will be used to execute query because DB state can change. However, there are general positions that must be taken into consideration at designing queries and indexes structure. Indexes must be generated in case when in the column or group of columns very often:
1) DB search is implemented: attribute or group of attributes is very often listed in WHERE proposition, operator SELECT;
2) table joins are carried out;
3) sorting of products in relations that return as DB queries is carried out: attribute or group of attributes is very often used within ORDER BY proposition, operator SELECT.
It is not recommended to generate indexes according to columns or group of columns that:
1) rarely used for search, join and sorting of queries products;
2) values are changed very often, that causes the necessity of frequent indexes update that significantly lowers operation speed;
3) contain small quantity of values variations.
In case when at queries execution sorting according to the columns is implemented, it is important to remember the following: generation of unique index can hasten queries execution. If in the search condition or proposition ORDER BY not all columns of this index are used, then for optimization of query is applied only continuous columns sequence.
Partial application of aggregative index permits to reduce DBMS resource losses [2, 4, 12]. To achieve this option aggregative index must be designed in such way that columns that participate in search process most of all are placed at the beginning of the list. Then only part of index can be used for search process. Fore example, the following query is execute4d very often
SELECT *
FROM SOMETABLE
WHERE A = :ParamA AND В = :ParamВ;
SELECT *
FROM SOMETABLE
WHERE A = :ParamA AND В = :ParamВ AND С = :ParamC;
SELECT *
FROM SOMETABLE
WHERE A = :ParamA AND В = :ParamВ AND С = :ParamC AND D = :ParamD;
Then, generating aggregative index on the base of columns А, В, C, D, we can affirm that present index will be applied for optimization of all three queries. In the first case will be used index subset that is values А, В, in the second case – values А, В, C, and in the third case – А, В, C, D (all index values).
The order of columns is not important, if conditions are joined by AND. For example, to carry out the following queries we can apply index that was implemented in the previous case:
SELECT *
FROM SOMETABLE
WHERE A = 100 AND В = 200;
SELECT *
FROM SOMETABLE
WHERE B = 200 AND A = 100;
For proposition ORDER BY order of columns arrangment in the aggregative index is important. For example, to optimize execution of the following queries it is necessary to use another index:
SELECT *
FROM SOMETABLE
ORDER BY А, В, C;
SELECT *
FROM SOMETABLE
ORDER BY А, C, В.
It is necessary to remember that in ORDER BY proposition only continuous sequence of values can be used. For example, previous index can not be used for queries execution:
SELECT *
FROM SOMETABLE
ORDER BY А, C;
SELECT *
FROM SOMETABLE
ORDER BY B, D.
But it will be rather useful for the following query:
SELECT *
FROM SOMETABLE
ORDER BY А, B;
In case of frequent application of several columns that are connected with operation OR in the search conditions of WHERE proposition, instead of index on the columns А, В, C several indexes on each column separately are generated. Otherwise, sequential scanning of the whole table will be realized. It happens due to the fact that indexed-sequential access for index on attributes А, В, C can be carried out for the column А only; values of columns В and С in such case are scattered in the index. It is important to remember, that when the operator OR is applied, each component of search conditions causes separate scanning of tables that take part in query. For example when operator SELECT is executed
SELECT *
FROM SOMETABLE
WHERE A = 100 OR В = 200 OR C = 300
three separate table scans for the search of values that meet statements А = 100; В = 200; C = 300 will be carried out.
Separate flow of search is generated also by each search element of the list IN. For example, WHERE A IN (100, 200, 300) is interpreted as WHERE A = 100 OR A = 200 OR A = 300. However, it doesn’t happen when the range BETWEEN WHERE A BETWEEN 100 AND 300 is determined. Therefore, where possible, should be replaced IN on BETWEEN.
Total index quantity reduction is realized by means of rational application of aggregative indexes and also providing referential integrity by means of triggers. When there is a big quantity of indexes, the speed of extension, modification and deletion of records in DB tables lowers. We know that two kinds of indexes are applied in DB: indexes dedicated to hasten data access, and others – to provide referential integrity. If the last kind doesn’t participate in data access optimization, it is better to remove them, and to provide referential integrity by means of triggers.
Index unbalance causes the situation when its depth overcomes critical value (2). Index depth – is a parameter that determines maximal quantity of operations that are necessary to find the required value in the DB table using this specific index. Table indexes can become unbalanced after repeated changes. In case of unbalancing the index value during the execution of queries lowers because of increase of time that is spent on data sampling. That is why from time to time it is necessary:
1) to rearrange index;
2) to enumerate factor «select capacity» of index;
3) to delete index and to regenerate it.
Rearrangement of index consists of its regenerating and balancing that comes after inactivation carried out by operator
ALTER INDEX Name INACTIVATE
and next activation that is carried out by operator
ALTER INDEX Name ACTIVATE.
Inactivation of index is also useful in case when big quantity of records is added to the table. In normal mode of records extension with an active index, modifications of index are made according to extension of table records that can cause unbalance. It is necessary to remember that:
- it is impossible to rebuilt index that is used at the present moment by other DB users;
- it is impossible to rearrange index that was generated within the operator CREATE TABLE (PRIMARY KEY, FOREIGN KEY, UNIQUE). For this, operator ALTER TABLE is applied;
- to carry out the operator ALTER INDEX corresponding DB access privileges are necessary.
Index «usefulness» factor is based on information about iterative index rows. It is applied by DBMS at table access to work out an optimal plan of query execution. Index application efficiency for the information search in the DB table directly depends on the index structure: if it was built according to unique values and, in case if not, how these data differ.
Index «usefulness» factor is calculated during its creation as a number of unique values that differ, index fields inside the index that is referred to average quantity of records. The degree of difference of column values that are used for index generation is changed after the modifications made in the table. That is why calculating «usefulness» factor can not reflect the actual state of index and it is recommended to recalculate factor values compulsory: from time to time – on making not significant modifications and always when serious changes are made. Recalculation is implemented by operator
SET STATISTICS INDEX Name.
To participate in query execution indexes with maximal «usefulness» factor are selected. Such indexes provide higher search speed. Unique indexes have the biggest «usefulness» factor.
Average quantity of records – is a factor that is calculated at every optimization of query as quantity of DB pages occupied by this table divided into maximal quantity of records on the page. Reduction of quantity of pages occupied by DB and elimination of “holes” cause the reduction of average quantity of records and, as a result, increases of «usefulness» factor. This is one more advantage of periodic DB optimization carried out by generating new backup copy and its update.
WISQL applet provides the possibility to display query execution plan. It is necessary to choose item of menu Session | Basic Settings and to determine mode Display Query Plan. Then, when query will be carried out at the same time execution plan will be displayed. Execution plan is a list of indexes that are used by DBMS for the data sampling.
For compulsory query execution according to some plan it is necessary to indicate the following proposition in the operator SELECT:
PLAN <query execution plan>, where <query execution plan> =
[JOIN | [SORT] MERGE] ({plan element | query execution plan}[, {plan element | query execution plan} ...]),
где <plan element> = {table | alias}
NATURAL | INDEX (<index>[, <index> ...]) | ORDER <index>.
Syntax of the proposition <query execution plan> provides the possibility to join several tables (proposition JOIN). In case of absence of indexes that should be applied for sorting table records, compulsory sorting is carried out (proposition [SORT] MERGE). Table, where the data search is carried out is indicated in proposition <plan element>. If the table participates in query several times to shorten the text of plan alias can be used. Proposition NATURAL, indicated that sequential access is applied for the search of records. It is the only one mode of search in case when there are no appropriate indexes. It is possible to indicate one or several indexes that must be used for records search and that meet query condition (proposition INDEX). And proposition ORDER permits to determine index required for tables sorting. Examples of proposition PLAN application are examined in corresponding DBMS documentation.
Minimization of joins with DB permits to save system resources [3, 5, 13, 18]. Consumption of system resources can influence the efficiency of data access. It is recommended to minimize quantity of joins and in ideal situation to carry out only one join for each application program of the client.
Methods of joins with DB depend on the environment of designing, where the application program was written. That is why you have to get acquainted with appropriate technical documentation. For example, component TDatabase is used to realize the join with deleted DB in client appends written on Delphi or C-Bilder. It is used to:
1) create constant join with DB;
2) create local DB alias;
3) change join parameters, determined for DB alias;
4) transactions management.
If you do not use TDatabase component, the join with DB will determine each component with type «data set» - Ttable (table), TQuery (SQL–query), TStoredProc (saving procedure). Moreover, it is impossible to change predetermined join parameters of «data set» type components for direct join with deleted DB.
Lowering of network traffic is achieved in case of delivery to the client application program minimal quantity of the information from DB that is enough for current manipulations with data only. For example, between two data sets that are implemented by components TTable and TQuery for data manipulation in application client-program it is better to choose the second one. Firstly, component TTable reads all records of deleted table during table data access, while TQuery – only those that are necessary for current visualization operations. For example, to fill TDВGrid. Secondly, during the access to the tables with big amount of data, application of TTable can cause significant temporary delays.
Components TTable and TQuery have different nature: TTable is oriented at navigation data access method that is more typical for the operations with local DBMS, TQuery is oriented at operations with multitude of records that is typical for deleted DB access in «client-server» architecture. TTable permits to apply to one DB table, TQuery – to apply to the query execution products from several DB tables at the same time. Correspondingly, data modification confirmation in TTable is carried out for each record that significantly increases network traffic. Data modification with TQuery component can be carried out for set of records by operators INSERT, UPDATE, DELETE at the same time.
TStoredProc component is used only for operations with procedures that are called, and is not applied for operations with sampling procedures that can return data sets. To operate sampling procedures component TQuery is applied.
To transfer load of computational process to the server first of all can speed up the operation of application client-program, and secondly, minimizes possibility of errors generation. Here are some recommendations that provide the possibility to implement it.
Optimality of DBMS is always estimated by end user considering the operation speed of data access and manipulation. Optimization starts on the stage of logical modeling of the universe of discourse (normalization of relations), continues on the stage of data access providing (optimization of queries) and finishes with implementation of interaction between client’s part of DBMS.
© Yaroslav Kuvaiev, 2005—2023.
All rights reserved.
All information placed on this Web site is designed for internal use only. Its reproduction or distribution in any form is prohibited without the written permission from Author.