Write up on Tech Geek History: CODASYL

Literature Review

Definition

CODASYL, which stands for Conference on Data Systems Languages, was a standards organization that played a significant role in the development of early computer languages and database management systems. It is best known for its work on COBOL and the CODASYL network data model, which was a precursor to modern relational databases.

A data base may be defined as a collection of interrelated data stored together with as little redundancy as possible to serve one or more applications in an optimal fashion; the data are stored_ so that they are independent of programs which use the data? a common and controlled approach is used in adding new data and in modifying and retrieving existing data within the data base.

Two t v p e s of languages are mentioned in connection with DBMS. The first is the Data Description Language (DDL) which describes the types of data entities which may exist along with the allowable attributes.

There may be two DDL * or two levels of DDL for describing a data base. The first level description is the system’s view of the data base as it is actually organized and the second* a user’s view of the data base. These levels are called the schema and subschema respectively. In the relational model terminology* the DDL may be called the relational algebra.

The second 1 language is the Data Manipulation Language (DM|_) which is concerned with the storage* retrieval and modification of specific occurrences of the entity types described by DDL statements. In relational model terminology* this 1 language corresponds to the relational calculus. The entities handled by DDL and DML may be records* sets or anything that may need manipulation. The attributes may be such thin as data items* set membership* set ownership or location within the data base.

The data base model is the et a-st rue t ure which is imposed on the organization of the data base. The model prescribes the types of entities which are allowed. It defines the data attributes and structural attributes that an entity may have.

The definition of a DDL and DML is the 15 implementation of the met a-st rue ures of a data base model. Currently the two most widely discussed models are the network model and the relational model,

COBOL data manipulation language (DML) is a programming language extension that provides a way for a COBOL application program to access a database. A COBOL database application program contains DML statements that tell the Database Control System (DBCS) what to do with specified data; the DBCS provides all database processing control at run time. The four classes of DML statements are data definition, control, retrieval, and update. An explanation of each class follows, together with important definitions of members of that class:

Data definition—These entries define the specific part of the database to be accessed by the application program and any keep lists needed to navigate through it. The entries also result in the creation of a database user work area (UWA). Transfer of data between your program and the database takes place in the UWA. Your program delivers data for the DBCS to this area; it is here that the DBCS places data requested from the database for retrieval to your program.

Terminology and Concepts.

For a complete description of the CODASYL schema DDL statements and DBMS design see Ref. 2. The schema DDL is used to describe a data base and has the following entity types: Data items, data aggregates, records, areas and sets .

A data item is an occurrence of a named atomic data attribute. It is the smallest unit of named data.

The set of values that a data item can assume is called its range.

The range of an item is always restricted to values of a particular type. The possible types are arithmetic data, string data, data base keys and implementor defined types.

A data aggregate is an occurrence of a named collection of data items. There are two kinds: vectors and repeating groups. A vector is a one dimensional sequence of data items, all with identical characteristics. A repeating group is a collection of data attributes that occurs multiple times within a record occurrence. The collection of attributes may include data items and data aggregates.

A record is an occurrence of a named collection of zero or more data items or data aggregates. Each record entry defines a record type of which there may be zero or more occurrences within the data base.

The record is the smallest addressable entity within the data base. A set is a named collection of records. Each set entry in the schema defines a set type for which zero or more occurrences (sets) may exist in the data base.

Each set type declared in the schema must have one record type declared as its owner and may have one or more record types declared as its members. Each set occurrence which exists in the data base must contain exactly one record of its owner type and zero or more o * its member record types.

An area is a named collection of records which need not preserve owner/member relations. An area may contain occurrences of multiple record types and a record tyoe may occur in multiple areas. A particular record occurrence of a 22 record is assigned to an area when it when it is created and it may not migrate out of that area. An area may be declared to be temporary. Temporary areas are created especially for a run-unit/ exist for the life of the run-unit and are destroyed when the process terminates.

Data Base Keys . The DDL assumes that every record occurence in the data base has a unique identifier which enables the DBMS to distinguish it from every other record in the data base. This key must be assigned when the record is created and remains with it for the life of the record. This key may be supolied to the DBMS by a run-unit or data base Procedure* generated from the record’s contents or assigned by the DBMS.

Network Model

The popularity of the network data model coincided with the popularity of the hierarchical data model. Some data were more naturally modeled with more than one parent per child. So, the network model permitted the modeling of many-to-many relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the network model. The basic data modeling construct in the network model is the set construct. A set consists of an owner record type, a set name, and a member record type. A member record type can have that role in more than one set, hence the multiparent concept is supported. An owner record type can also be a member or owner in another set. The data model is a simple network, and link and intersection record types (called junction records by IDMS) may exist, as well as sets between them . Thus, the complete network of relationships is represented by several pairwise sets; in each set some (one) record type is owner (at the tail of the network arrow) and one or more record types are members (at the head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The CODASYL network model is based on mathematical set theory.

Hierarchical Model

The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. Data in a series of records, which have a set of field values attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model, and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping between record types. This is done by using trees, like set theory used in the relational model, “borrowed” from maths. For example, an organization might store information about an employee, such as name, employee number, department, salary. The organization might also store information about an employee’s children, such as name and date of birth. The employee and children data forms a hierarchy, where the employee data represents the parent segment and the children data represents the child segment. If an employee has three children, then there would be three child segments associated with one employee segment. In a hierarchical database the parent-child relationship is one to many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM’s Information Management System (IMS) DBMS, through the 1970s.

NETWORK DATA MODEL The Conference on Data Systems Languages (CODASYL), the organization comprising of vendor representatives and user groups, developed the language COBOL. In the late 1960s, CODASYL appointed a subgroup known as the Database Task Group (DBTG) to develop standards for database systems. DBTG published a preliminary report in 1969. Based on revisions and suggestions made for improvement, DBTG published a revised version of the report in 1971.

Essentially, the network data model is based on the 1971 DBTG report. This data model conforms to a three-level database architecture: conceptual, external, and internal levels. A number of commercial database systems were developed to implement the network data model. Summary of Basic Concepts

• Data is organized in the form of records being arranged as a network of nodes.

• Two fundamental modeling concepts make up the network data model: record types and set.

• Two record types are linked as a set. The set expresses the one-to-one or one to-many relationship between two record types.

• A set expressing the relationship between two record types consists of a member record type and an owner record type.

• One owner record type may be part of different sets with different member record types.

• Similarly, one member record type may have multiple owner record types.

• A network consisting of one-to-one or one-to-many relationships is known as a simple network. A complex network, on the other hand, contains many-to many relationships also.

• Each record type generally represents an entity type of the organization. Data fields in the segment types denote the attributes of the entity type.

• An instance of a set type represents one occurrence of the entity represented by the record type.

• Logical links between related records are implemented through physical addresses (pointers) in the record itself.

In order to specify the relationship between DDL declarations and DML functions a set of basic data manipulation functions must be defined which is DML and host language independent. Specific commands provided by a particular DML must be resolved into those basic functions. The resolution is defined by the implementor of the DML.

A description of the Schema DDL consists of four major sections:

an introductory clause

• one or more AREA clauses

• one or more RECORD clauses

• one or more SET clauses.

Write up on Tech Geek History: CODASYL

Leave a Comment Cancel Reply