Background of the Study:
Introduction:
Charles Bachman was a pioneer in the management of computer database systems. He spent his entire career in industry, working for Dow Chemical, GE, Honeywell, Cullinane Database Systems, and his own startup Bachman Information Systems. His work at GE in the early 1960’s, building the first company-wide database management system, called Integrated Data Store (IDS), allowed users across the company to access a common database. In 1973 he was awarded the ACM Turing Award for this work
Bachman created IDS as a practical tool, not an academic research project. In 1963 there was no database research community.
Literature Review:
One of the oldest DBMSs, Integrated Data Store (IDS), was developed at General Electric by Charles Bachman during the early 1960s using the network model. The Conference on Data Systems Languages (CODASYL), an organization consisting of representatives of major hardware and software vendors and users, was formed to try to standardize many aspects of data processing. CODASYL had successfully written standards for the COBOL language. In the late 1960s the CODASYL organization formed a subgroup called the Data Base Task Group (DBTG) to address the question of standardization for database management systems. Influenced by IDS, the group proposed a network-based model and specifications for data definition and data manipulation languages. In 1971 its first official report was submitted to the American National Standards Institute (ANSI), which refused to accept or reject the proposed standard. The 1971 report was succeeded by several newer versions, but it remained the principal document describing a network-based model generally referred to as the CODASYL model or the DBTG model, and several popular database management systems were based on it. In addition, it provided the vocabulary and framework for discussion of database issues, establishing for the first time the notion of a layered database architecture and common terminology. Although the hierarchical and network models were powerful and efficient, they were complex, requiring users to understand data structures and access paths to data. They were designed for use with programs rather than for interactive access by users. The logic required to locate the desired records was contained in the applications, not in the database The relational model was first proposed by E.F. Codd in 1970, in a paper called “A Relational Model of Data for Large Shared Data Banks.” It was the first model based on abstract concepts from mathematics, which provided a strong theoretical base. Early research on the model was done at the IBM Research Laboratory in San Jose, California. System R, a prototype relational database management system (RDBMS), was developed by IBM researchers during the late 1970s, and the research results, including the development of a new language, SQL (Structured Query
1.1
Historical Developments in Information Systems 13 Language), were widely published. Another important research project based on the relational model was Ingres, developed at the University of California, Berkeley, by Eugene Wong and Michael Stonebraker. Both Postgres and PostgreSQL were developed from Ingres. Recognizing the value of the relational model, Larry Ellison, along with Bob Miner and Ed Oates, founded a company to use the results of the System R project and released Oracle, the first commercial RDBMS, in 1979. IBM released its first commercial RDBMS, called SQL/DS, in 1981, followed by the announcement of DB2 in 1983. The widespread use of microcomputers beginning in the 1980s led to the development of PC-based RDBMSs. SQL, the language developed for System R, became the standard data language for relational-model databases, with ANSI-approved standards published starting in 1986, major revisions in 1992 (SQL2) and 1999 (SQL3), and further expansions in 2003, 2006, 2008, 2011, 2016, and 2019. Oracle, DB2, Microsoft’s SQL Server, MySQL, Microsoft Access, and PostgreSQL, all of which use the relational model, are popular DBMSs. The relational model uses simple tables to organize data. However, it does not allow database designers to express some important distinctions when they model an enterprise. In 1976, Peter Chen developed a new type of model, the entity-relationship (ER) model. This is an example of a semantic model, one that attempts to capture the meaning of the data it represents. It is most often used in the design phase for databases. The ER model has been extended several times to make it semantically richer, resulting in the extended entity-relationship (EER) model
1.2
Before integrated databases were created, file processing systems were used, and data used by an organization’s application programs was stored in separate files. (Note: Although the word data is plural in standard English, it is customary to use it as both singular and plural in database literature, as in “data is” and “data are.”) Typically, a department that needed an application program worked with the organization’s data processing department to create specifications for both the program and the data needed for it. Often the same data was collected and stored independently by several departments within an organization, but not shared. Each application had its own data files that were created specifically for the application and that belonged to the department for which the application was written. Personal computer databases can create a similar scenario, where individuals or departments set up their own databases or spreadsheets containing data, creating data silos, isolated collections of data that are inaccessible to other departments within an organization. Having multiple copies of the same data within isolated files or small databases can lead to flawed, outdated, or contradictory information, thus creating confusion for users. Most organizations can benefit by having the data used for sets of applications integrated into a single database. In this text, we assume the typical database is a large one belonging to a business or organization, which we will call the enterprise. However, the techniques described apply to databases of all sizes. An integrated database is a collection of related data that can be used simultaneously by many departments and users in an enterprise. It is typically a large database that contains all the data needed by an enterprise for a specific group of applications or even for all of its applications stored together, with as little repetition as possible. Several different types of records may appear in the database. Information about the structure of the records and the logical connections between the data items and records is also stored in the database, so that the system knows, for example, which faculty record is connected to a particular class record. This “data about data” is called metadata.
1.3
The Integrated Data Store (IDS), the first direct-access database management system, was developed at General Electric in the early 1960s. Revisiting the development challenges that lead to its first production version reveals the origins of DBMSs and their impact on software development and business management. IDS and its derivative systems are still in use today, supporting a thousand mainframe installations
1.4
In the late 1950s and early 1960s, no independent software vendor industry existed. Software of all kinds was developed either by computer manufacturers or by one or a group of computer users. It was either bundled with the computer at no extra cost or given away free and shared by the users who developed it, much like today’s open source movement. This article will examine the business requirements that lead to the development of the Integrated Data Store (IDS), the first direct-access database management system (DBMS),1 and will look at some of the develop[1]ment challenges that lead to its first produc[1]tion version. To this day, IDS leads a healthy, productive life, driving large transaction[1]oriented systems around the world, 50 years after its conception. When General Electric engineers first began developing IDS in 1961, there were no general purpose operating systems, no file systems, no DBMSs, and no communications systems to learn from or build on. There was no multi[1]programming, time sharing, or online debug[1]ging tools. The machines were essentially naked. For business data processing, it was a batch-oriented, serial-file-processing, load and execute one-program-at-a-time world
1.5
DEFINITION STRUCTURE The Definition Structure required by IDS is a list structure which reflects the description of the various data records of the IDS file. It defines the master/detail relationships between data records, the chain characteristics, and the physical and control characteristics of every field of every record type in the IDS file.
In the late 1950s, GE was the largest commercial user of computers in the world. GE was also the biggest manufacturer of computers for demand deposit accounting. From 1958 to 1965, the GE Integrated System Projects (ISPs) were driven by GE’s Manufacturing Services, lead by Halbert Miller. The company’s corporate services provided research, expertise, and consulting to its 100 product manufacturing departments. These product departments were in a range of product areas: atomic energy, electric energy generation and distribution, jet engines, electric motors, home appliances, light bulbs, x-ray machines, diesel electric locomotives, artificial diamonds, and general-purpose computers. At that time, GE’s Manufacturing Services management was greatly concerned that all of GE’s manufacturing businesses were investing heavily in the design and installation of computerized manufacturing systems and that the development process was slow, expensive, and error prone.
The first ISP (1958–1959) developed some interesting product ideas in the area of manufacturing simulation (later leading to SIMSCRIPT), generative engineering, and decision tables (TABSOL).
The second ISP (ISP 2) began late in 1960. Its target was to design and build a generic manufacturing control system. The project was managed by GE’s Production Control Ser[1]vice and lead by Stanley B. Williams. Williams came from the GE Large Transformer Department in Pittsfield, Massachusetts, and had 12 years of engineering and engineering systems experience. I joined the project as its chief architect, with 10 years of experience in engineering, finance, manufacturing, and data processing with the Dow Chemical Company in Midland, Michigan
The first packaged versions of IDS did lack some features later viewed as essential for database management systems. One was the idea that specific users could be granted or denied access to particular parts of the data[1]base. This omission was related to another limitation: IDS databases could be queried or modified only by writing and executing programs in which IDS calls were included. There was no capability to specify “ad hoc” reports or run one-off queries without having to write a program.
These capabilities did exist during the 1960s in report genera[1]tor systems (such as 9PAC and MARK IV) and in online interactive data management systems (such as TDMS) but these packages were generally seen as a separate class of software from database management systems. By the 1970s report generation packages, still widely used, included optional mod[1]ules to interface with data stored in database management systems.
Integrated Data Store (IDS) The IDS DBMS was created by assembling many elements that had appeared in research papers and existing systems. We combined these with some new elements so that the whole would meet the envisioned manufacturing control system’s requirements. This included the following elements: A direct access database was implemented on a virtual memory basis, with page turning, revitalized hash (calculated) addressing, data integrity control, clustered records, and database keys. The network data model, with logical records that mapped transparently onto physical records in the virtual memory and logical O/M sets, was mapped transparently onto linked lists of physical records.
A data description language, with Data Description Language (DDL) statements that defined the types of logical records with their data, relationships, and constraints that could appear within the database. A data storage and retrieval language, with Data Manipulation Language (DML) statements could be easily integrated into a record-at-a-time procedural language, such as GECOM, Comtran, Fact, Cobol, or PL/1, which were also available. The record-at-a-time data-manipulation statements included STORE, RETRIEVE, MODIFY, and DELETE. An exclusive ‘‘working storage’’ area for each record type, providing the computer memory locations where IDS and the application programs, could exchange data under tight integrity controls.