Write UP on Tech Geek History: Pascal Complier (revision 3)

Background of Study

:: History of Pascal ::

The language Component Pascal was the culmination of several decades of research. It was the youngest member of the Algol family of languages. Algol, defined in 1960, was the first high-level language with a readable, structured, and systematically defined syntax.

In the late sixties, several proposals for an evolutionary successor to Algol were developed. The most successful one was Pascal, defined in 1970 by Prof. Niklaus Wirth at ETH Zurich, the Swiss Federal Institute of Technology. His [Prof. Niklaus] principle objectives for Pascal were for the language to be efficient to implement and run, allow for the development of well structured and well organized programs, and to serve as a vehicle for the teaching of the important concepts of computer programming. Pascal, which was named after the mathematician Blasé Pascal, is a direct descendent from ALGOL 60, which Wirth helped develop. Pascal also draws programming components from ALGOL 68 and ALGOL-W. The original published definition for the Pascal language appeared in 1971 with latter revisions published in 1973. It was designed to teach programming techniques and topics to college students and was the language of choice to do so from the late 1960’s to the late 1980’s.

Pascal received a big boost when ETH released a Pascal compiler that produced a simple intermediate code for a virtual machine (P-code), instead of true native code for a particular machine. This simplified porting Pascal to other processor architectures considerably, because only a new P-code interpreter needed be written for this purpose, not a whole new compiler. One of these projects had been undertaken at the University of California, San Diego. Remarkably, this implementation (UCSD Pascal) didn’t require a large and expensive mainframe computer, it ran on the then new Apple II personal computers. This gave Pascal a second important boost. The third one came when Borland released Turbo Pascal, a fast and inexpensive compiler, and integrated development environment for the IBM PC. Later, Borland revived its version of Pascal when it introduced the rapid application development environment Delphi.

Pascal has greatly influenced the design and evolution of many other languages, from Ada to Visual Basic.

Significance of Study

:: Application Areas ::

Pascal contains some significant language features that allow it to used as a powerful learning tool in introducing structured programming techniques:

Built in Data Types- Pascal contains it’s own built in data types of Integer, Real, Character, and Boolean.

User defined Data Types – Has the ability to define scalar types as well as sub-ranges of those data types.

Provides a defined set of Data Structures- These data structures include Arrays, Records, Files and Sets.

Has a strong data typing element – Pascal compliers can diagnose an incompatible assignment of one type to a variable to another type.

Supports Structured Programming – This is accomplished through the use of subprograms called procedures and functions.

Simplicity and Expressivity – Because the language is simple and expressive in nature it allows for effective teaching of computer programming techniques.

The Pascal Language Pascal was originally designed by Niklaus Wirth with two primary goals: to teach programming as a systematic discipline, and to implement programs in a reliable and efficient way. But there are other reasons for the widespread interest in Pascal as a general purpose programming language and as a system program implementation language.

• • • • Pascal, a relatively modern language, has benefited from many earlier languages (such as ALGOL) and is forming the basis of several new-ones. The key emphasis of Pascal is its role as a higher level language ; that is, a language providing useful abstract tools for specifying data structures and algorithms, independent of the underlying implementation. The user need not be concerned with the repre sentation of data (the number of bytes per variable, organization of arrays, address size, and so on). The programmer can more accurately specify the characteristics of variables, such as the range of values allowed (V AR I: 0 .. 99) or decide whether 1-3 to trade space for time in accessing variable components. (PACKED keyword) IBM has added several goals to this list:

2. 3. IBM Personal Computer Pascal is designed to be a systems implementation language, especially suitable for writing compilers, interpreters, operating systems, and so on. Generating efficient code is paramount. The various language extensions, and the global optimizer phase (pass two of the compiler) all work toward minimizing the time and space needed by compiled programs. Operations that are done easily in assembly language should be easy to do in IBM Personal Computer Pascal.

https://bitsavers.org/pdf/ibm/pc/languages/IBM_Pascal_Compiler_Aug81.pdf

Each line defines a computer instruction consisting of an operation code possibly followed by some arguments. The code becomes a bit more readable if were place the operation codes by readable names and enclose the arguments in parentheses:

Program (1, 2, 5, 1)

LocalVar(3)

Constant(1)

Simple Assign End Program

Key Point of the Study: Record Data Types

A record structure acts as a template for a col lection of conceptually related data of different types. The record type itself is a structure consisting of a fixed number of components, usually of different types. Each component of a record type is called a field. The definition of a record type specifies the type and an identifier for each field within the record. Because the scope of these “field identifiers II is the record definition itself, they must be unique within the declaration. The field values associated with field identifiers are accessible with either record notation or the WITH statement. The example below declares a record with three (100) , fields:

TITLE of type LSTRING(100),

ARTIST of type LSTRING and PLASTIC [l •• SaNG_NUMBER] OF SONG TITLE:

CaNST SONG NUMBER = 1000:

TYPE SONG TITLE of LSTRING (20): LP = RECORD

TITLE: LSTRING (100): ARTIST: LSTRING (100):

PLASTIC : ARRAY type ARRAY [l •• SaNG_NUMBER] OF SONG TITLE END: In this example, when a variable of type LP is declared (for example BEATLES 1, below), the compiler allocates a contiguous -block of memory sufficient to hold all the fields. Each field then can be used as an independent variable. VAR BEATLES 1 : LP: Finally, you could access a component of the record using either field notation

(note the period separating field identifiers) or using the WITH statement:

BEATLES 1.TITLE := ‘Meet The Beatles’:J

WITH BEATLES 1 DO PLASTIC[I]-:= ‘I Wanna Hold Your Hand’

Thus, BEATLES 1.TITLE can be used as a variable of type LSTRING(100). Note that when the WITH statement is used above, using PLASTIC wi thin it is identical to using BEATLES1.PLASTIC outside the context of the WITH statement.

Record Types Record types are perhaps the most flexible of data constructs. Conceptually, a record type is a template for a structure whose parts may have quite distinct characteristics. For example, assume we wish to record information about a person. Known are the name, height, sex, date of birth, number of dependents, and marital status. Furthermore, if the person is married or widowed, the date of the (last) marriage is given; if divorced, the date of the (most recent) divorce and whether this is the first divorce or not; and if single, no other information is of interest. All of this information can be expressed in a single “record,” and each piece of information can be accessed separately.

A record defining Person can n:1W be formulated as:

type String15 = packed a:ray [1 .. 15] of Char;

Status = (Married, Wed, Divorced, Single);

Date = packed record Year: 1900 .. 2)0;

Mo: (Jan, Feb ~ar, Apr, May, Jun, Ju1, Aug. Sep, Oct, Nov, Dec);

Day: 1. .31; end;

Natural = O .. Maxlnt;

Person = record Name: recor(First, Last: String15 end; Height: Natl! ,,1 { centimeters };

Sex: (Male ,’emale) ;

Birth: Date; Depdts:

Natl dl; case MS: Sti ll:; of Married, v j()wed: (MDate: Date);

Divorced: :~II)ate: Date; ~.rstD: Boolean); Single: () end { Person }

Fixed Records More formally, a record is a structure consisting of a fixed number of components, called fields. Unlike the array, components of a record type can have different types and cannot be indexed by an expression. A record-type definition specifies for each component its type and an identifier, the field identifier, to denote it. The scope of a field identifier is the innermost record in which it is defined. The two operations valid for entire record variables are assignment and selection of components. In order that the type of selected component be evident from the program text (without executing the program), the record selector consists of fixed field identifiers rather than a computable index value

Record as a Structured Data Type A record is a collection of fields that may be treated as a whole or individually. To illustrate, a record that contains fields for a customer’s name, age, and annual income could be visualized as shown in

This schematic representation may help you understand why a record is considered a structured data type and familiarize you with the idea of using fields in a record. Declaring a RECORD Let’s now consider our first example of a formally declared record. Assume we want a record to contain a customer’s name, age, and annual income. The following declaration can be made

VAR Customer : RECORD Name : PACKED ARRAY [1..3a] OF char;

Age : integer;

Annual Income : real END; < of RECORD Customer >

Another method of declaring a record is to use the TYPE definition section to define an appropriate record type. This form is preferable because it facilitates use with subprograms.

Thus, we can have TYPE Customerlnfo =

RECORD Name : PACKED ARRAY [1..3D] OF char;

Age : integer;

Annual Income : real END; { of RECORD Customerlnfo }

VAR Customer :

Customerlnfo; Components of a record are called fields and each field has an associated data type.

The general form for defining a record data type using the TYPE definition section is TYPE Type identifier = RECORD field identifier 1: data type 1; field identifier 2 : data type 2; field identifier n : data type n END; {of RECORD definition}

The syntax diagram for this is identifier datatype END

Record Definitions The type identifier can be any valid identifier. It should be descriptive to enhance program readability. 2. The reserved word RECORD must precede the field identifiers. 3. Each field identifier within a record must be unique. However, field identifiers in different records may use the same name

TYPE Type identifier = RECORD field identifier

1 : data type 1; field identifier

2 : data type 2; field identifier n :

In most data processing applications, records contain a number of fields, and it is important to be able to write statements in a program to move all the fields as a single unit. We will be introducing the idea of a record structure which is a group of several fields designed to make file processing simple to program. When large quantities of data have to be processed, it is impossible to store files of records completely within the main memory of the computer. It is usual to keep large files in secondary storage such as magnetic tape or magnetic disk storage. We must then be able to read records from such a file and write records into it. We will be looking at the statements in Pascal that permit us to manipulate files in secondary storage. RECORDS A Pascal RECORD is a collection of several fields and is particularly suitable for records in a file. As a simple example, suppose that we want to describe each entry in the telephone book as a record. We would identify the entire record by the identifier CUSTOMER and the three fields as

CUSTOMER.NAME CUSTOMER.ADDRESS CUSTOMER.PHONENUMBER

Here is a diagram showing the fields:

CUSTOMER NAME ADDRESS PHONENUMBER

The field identifiers are a composite of their own identifiers, NAME, ADDRESS, and PHONENUMBER and the whole record’s identifier, CUSTOMER.

The composite is constructed by putting a dot, or period, between the record name and the field name. The record structure would be declared this way.

VAR CUSTOMER: RECORD NAME: PACKED ARRAY[1..20] OF CHAR; ADDRESS: PACKED ARRAY[1..301 OF CHAR; PHONENUMBER: PACKED ARRAY[1..81 OF CHAR END;

This record structure consists of two levels o first level we have the identifier of the declared, namely CUSTOMER. The next level declared. Each of these has its own type. So i have each field with a different type. Here, a of type PACKED ARRAY..OF CHAR, but each has a di A record structure is sometimes record.

MOVING RECORDS f naming. At the record structure has three fields t is possible to 11 the fields are different range. called the layout of a record

FILES IN SECONDARY MEMORY

In our discussion of files so far, we have had the files stored in the main memory. In most real file applications, the files are too large to be contained in main memory. The part of the file being processed must be brought into main memory, but the complete file is stored in secondary memory. The secondary memory may be magnetic tape or magnetic disk. A file in the same type.

One item at structure in ma the dataset, in the sequence can be read memory or writt memory. This not possible at the file; the available. Since secondary sto The collect! a time may in memory, or The item that of records i sequentially en sequential kind of fil any moment t next item files rage is a collection of values all of on is sometimes called a dataset. be transferred from the dataset to a from a structure in main memory to is transferred must be the next item n the dataset. We say that the file from the secondary memory to the main ly from the main memory to secondary e is called a sequential file. It is o get access to an arbitrary item in in sequence is the only one that is in secondary storage are to be accessed sequentially, there must be a statement in the program that will position the file reader at the first item of the dataset.

Before a file in secondary storage can be read, we must have a statement of the form RESET(file name); Files in Secondary Memory 209 The file name is that of the whole dataset or file. It may, for example, be named OLDFILE.

A file is a structure that consists of a sequence of–components, all of the same type. It is through files that Pascal interfaces with the operating system. Therefore, you must understand the FILE type in order to perform input to and output from a program. DECLARING FILES . As with any other type, you must declare a file variable in order to use it. However, the number of components in a file is not fixed by declaring a FILE type. Examples of FILE declarations:

referents (allocated on the heap). (File in this context, refers to a file control block, that is a memory structure that contains information about the file.) Except for files in super arrays, the compiler generates code to initialize a file when it is allocated and to CLOSE a file when it is deallocated. This initialization call occurs automatically in most cases. However, a file declared in a module or uninitialized unit I s interface will only get its initialization call if you call the module or unit identifier as a procedure. File declarations in such cases get the following compiler warning: Contains file initialize module Only a file in an interface of an uninitialized unit does not generate this warning

This version of Pascal sets up the standard files, INPUT and

At the extend level, you can use the ASSIGN or READFN procedures to give explicit operating system filenames to files not included in the program header. Files in record variants or super array types are not recommended: if you use them, the compiler issues a warning. A file variable cannot be assigned, compared, or passed by value. It can only be declared and passed as a reference parameter. At the extend level, you can indicate a file IS access method or other characteristics by speci fying the mode of the file. The mode is a value of the predeclared enumerated typelffLEMODES.

The modes available normally include the three base modes, SEQUENTIAL, TERMINAL, and DIRECT. files, except INPUT and OUTPUT, are All given SEQUENTIAL mode by default. INPUT and OUTPUT are given the default mode TERMINAL.

TYPE COLOR = (RED, BLUE)~ FI F2 F3 FILE OF COLOR~

FILE OF CHAR~ TEXT~ {Similar to FILE OF CHAR}

Conceptually, a file is simply another data type, like an array, but with no bounds and with only one component accessible at a time. However, a file usually corresponds to one of the following: o o o o o disk files keyboard video display printers other input and output devices This implies the following restriction in Pascal: a FILE OF FILE is illegal, directly or indirectly. Other structures, such as a FILE OF ARRAYs or an ARRAY OF FILEs, are permitted. The operating system is used to access files, and no additional formatting or structure is imposed on the files. Our version of Pascal supports normal statically allocated files, files as local variables (allo cated on the stack), and files as pointer

FILE STRUCTURES Pascal files have two basic structures: BINARY and ASCII. These two structures correspond to raw data files respectively. BINARY and human-readable textfi1es, The Pascal data type FILE OF corresponds to BINARY structure files. These, in turn, correspond to unformatted operating system files. See the subsection “File Access Modes” for further discussion of BINARY files. ASCII The Pascal data type TEXT corresponds to ASCII structure files. These, in turn, correspond to textual operating system files, which we refer to as textfi1es. The Pascal TEXT type is like a FILE OF CHAR, except that groups of characters are organized into “lines” and, to a lesser extent, “pages.” Primitive file procedures, such as GET and PUT, always operate on a character basis. Pascal textfi1es (files of type TEXT) are divided into lines wi th a line marker, (the line feed: 0Ah). When read, this character always looks like a blank. At the extend level, a declaration for a textfi1e can include an optional line length. Setting the line length, which sets record length,. is only needed for DIRECT mode textfi1es. You can specify line length for other modes as well, but doing so has no effect.

This identifier is declared at the beginning of the procedure by

VAR OLDFILE: FILE OF CUSTOMERTYPE;

In general, a file may be declared as a FILE OF any type, such as FILE OF INTEGER or FILE OF PACKED ARRAY[1..10] OF CHAR. To read the next item from the file in secondary storage, we write a statement of the form READ (file identifier,variable);

We can read records from OLDFILE into variables such as CUSTOMER whose type is CUSTOMERTYPE using this statement.

READ (OLDFILE,CUSTOMER); If a file is to be written, it must first be prepared for writing using the statement REWRlTE(file identifier); To write a record into such a file, we use a statement of this form WRITE (file identifier,expression); As with input files, an output file must be declared as a FILE and can receive only values (expressions) of the type given following FILE OF.

Following RESET the file can be read but not written. Following REWRITE the file can be written but not read. Once a file has been written, the program can RESET the file and then read it. The names of files used by a program must appear in the program heading statement. In all programs we have shown so far the input came from the card reader (or keyboard input device) and the output went to the printer.

These “files” have the standard names INPUT and OUTPUT. That is why we always preface our programs with the line PROGRAM identifier(INPUT,OUTPUT); If in addition to these two files we intend to use files named OLDFILE and NEWFILE our program heading would be

PROGRAM identifier(INPUT,OUTPUT,OLDFILE,NEWFILE); 210 PS/7:

Files and Records If a file is local, meaning it is created by saved afterwards, then its name does not need program heading. FILE MAINTENANCE As an example of reading and writing the program and not to appear in the files we will program a simple file-maintenance operation.

We wi exists a file of CUSTOMER records called update this file by adding new customers, the new customers is punched on cards. E a transaction that must be posted in the f to-date example customer of file file, which we will ca maintenance. The fil alphabetically by CUSTOMER.NAME and the tr alphabetically.

This program will be very sort program of Chapter 13, except that files being merged are not in an array. 11 assume that there OLDFILE, and we want to The information about ach card corresponds to ile to produce an up- 11 NEWFILE. This is an e OLDFILE is ordered ansactions are arranged similar to the merge- the records of the

Pascal contains a implement linked lists these data structures will do this first and Suppose called DATA. For simplici feature called pointers that can be used to but it is possible to implement all of without pointers using arrays instead. We then show how Pascal pointers can be used.

LINKED LISTS that we had a file of records stored in an array The records are arranged in sequence on some key. ty, we will consider that each record consists only of a single f if the order ield which is the key to the ordering. We know that is ascending and no two keys are identical, then DATA[1+1] > DATA[I] Pascal contains a implement linked lists these data structures will do this first and Suppose called DATA. For simplici feature called pointers that can be used to but it is possible to implement all of without pointers using arrays instead. We then show how Pascal pointers can be used.

The difficulty with this kind of data structure for a file comes when a new item is to be added to the file; it must be inserted between two items. This means we would have to move all the items with a key higher than the one to be inserted, one location on in the array. For example, you can see what happens when we insert the word DOG in this list: before DATA[1] DATA[2] data!31 DATA[4] DATA[51 data!6] CAT DUCK FOX GOOSE PIG – after inserting DOG CAT DOG DUCK FOX GOOSE PIG Any list that is changing with time will have additions and deletions made to it. A deletion will create a hole unless entries are moved to fill the hole. – When the list changes with time we can use the data structure called the linked list. In the linked list each item has two components, the data component and the linking component or link. (We will be using Pascal pointers to implement links later.) We associate with each entry in the DATA array an entry in a second array of integers called LINK. The number stored in LINK[l] is the index of the next entry in the sequence of the DATA array. This means that the actual or physical sequence in the DATA a?:ray is different from the logical sequence in the list. Here is an example showing our previous list as a linked list. The start of the list is stored in the INTEGER variable FIRST. FIRST DATA[1] DATA[21 DATA[31 PATA[4] DATAl5] DATA[6] PIG FOX CAT GOOSE DUCK – Here is a diagram of this: 3 LINK[11 o LINK[2] V LINK[3] ^ LINK[4] LINKi 5 ] link!61 0 “4 -5 1 -2 -The difficulty with this kind of data structure for a file comes when a new item is to be added to the file; it must be inserted between two items. This means we would have to move all the items with a key higher than the one to be inserted, one location on in the array. For example, you can see what happens when we insert the word DOG in this list: before

DATA[1] DATA[2] data!31 DATA[4] DATA[51 data!6]

CAT DUCK FOX GOOSE PIG – after inserting DOG CAT DOG DUCK FOX GOOSE PIG

Any list that is changing with time will have additions and deletions made to it. A deletion will create a hole unless entries are moved to fill the hole. – When the list changes with time we can use the data structure called the linked list. In the linked list each item has two components, the data component and the linking component or link. (We will be using Pascal pointers to implement links later.)

We associate with each entry in the DATA array an entry in a second array of integers called LINK. The number stored in LINK[l] is the index of the next entry in the sequence of the DATA array. This means that the actual or physical sequence in the DATA a?:ray is different from the logical sequence in the list. Here is an example showing our previous list as a linked list. The start of the list is stored in the INTEGER variable FIRST. FIRST DATA[1] DATA[21 DATA[31 PATA[4] DATAl5] DATA[6] PIG FOX CAT GOOSE DUCK – Here is a diagram of this: 3 LINK[11 o LINK[2] V LINK[3] ^ LIN

A system program, known as a compiler, performs the translation from Pascal to machine code. Compilers play a crucial role in software development: They enable you to ignore the complicated instruction sets of computers and write programs in a readable notation. Compilers also detect numerous programming errors even before you begin testing your programs. However, you can ignore the code generated by a compiler only if you know that the compiler never makes a mistake! If a compiler does not always work, programming becomes extremely complicated. In that case, you will discover that the Pascal report no longer defines exactly what your program does. This is obviously unacceptable. So the following design rule is essential:

Rule 1.1: A compiler must be error-free. This requirement is quite a challenge to the compiler designer when you consider that a compiler is a program of several thousand lines. Compiler writing forces you to apply very systematic methods of program design, testing, and documentation.

It is one of the best educational experiences for a software engineer. The input to a compiler is a program text.

The first task of the compiler is to read the program text character by character and recognize the symbols of the language. The compiler will, for example, read the characters program

WHAT A COMPILER DOES

recognize them as the single word program. At this point, the compiler views the previous program example as the following sequence of symbols: program name semicolon var name x colon name integer semicolon begin name x becomes numeral 1 end period This phase of compilation is called lexical analysis. (The word lexical means “pertaining to the words or vocabulary of a language.’’) The next task of the compiler is to check that the symbols occurring the right order. For example, the compiler recognizes the sentence xi=1 as.an assignment statement.

But if you write x=1 instead, the compiler will indicate that this is not a valid statement. This phase of compilation is called syntax analysis. (The word syntax means “‘the structure of the word order in a sentence.’’) The syntax analysis is concerned only with the sequence of symbols in sentences. As long as a sentence consists of a name followed by the :=

symbol and an expression, it will be recognized as an assignment statement. But even though the syntax of the statement is correct, it may still be meaning less, as in the following example: yirl which refers to an undefined variable y. The assignment statement x ‘= true is also meaningless because the variable x and the value true are of different types (integer and Boolean). The phase of compilation that detects meaning less sentences such as these is called semantic analysis. (The word semantics means “‘the study of meaning.’’) As these examples show, the compiler must perform two kinds of se mantic checks: First, the compiler must make sure that the names used in a program refer to known objects: either predefined standard objects, such as the type integer, or objects that are defined in the program, such as the variable x. The problem is to recognize a definition such as var x: integer;

WHAT A COMPILER DOES

and determine in which part of the program the object x can be used. This task is called scope analysis. (The word scope means “‘the extent of application.”) During this part of the compilation, the compiler will indicate if the program uses undefined names, such as y, or introduces ambiguous names in definitions, such as var x: integer; x: integer;

Second, the compiler must check that the operands are of compatible types. This task is called type analysis. When you compile a new program for the first time, the compiler nearly always finds some errors in it. It will often require several cycles of editing and recompilation before the compiler accepts the program as correct.

So, in designing a compiler, you must keep in mind that most of the time it will be used to compile incorrect programs! If there are many errors in a program, it is convenient to output the error messages in a file which can be inspected or printed after the compilation. However, the compiler will be able to complete this file and close it properly only if the compilation itself terminates properly. If the compilation of an incorrect program causes a run-time failure, such as an arithmetic overflow, the error messages will be lost and you will have to guess what happened. To avoid this situation, we must impose the following design requirement:

Rule 1.2: A compilation must always terminate, no matter what the input looks like. The easiest way to satisfy this requirement is to terminate the compilation when the first error has been detected. The user must then remove this error and recompile the program to find the next error, and so on. Since a compilation may take several minutes, this method is just too slow. To speed up the program development process, we must add another design requirement: .

Rule 1.3: A compiler should attempt to find as many errors as possible during a single compilation. As you will see later, this goal is not easy to achieve. There is one exception to the rule that a compilation must always Terminate: If a program is so big that the compiler exceeds the limits

The part of a compiler that performs lexical analysis is called the scanner. This chapter describes the Pascal— scanner (pass 1) and explains how it recognizes symbols in a program text and replaces them by a more convenient representation. Much of the discussion concerns the pro’ tem of searching for word symbols and names in a symbol table.

SOURCE TEXT The scanner reads a program text character by character. It uses a control character to indicate that it has reached the end of the source text const ETX = 38C; This is the Pascal x notation for ASCII character number three, known as ETX (End of Text). When the scanner finds a character that cannot occur in any symbol, it reports the problem. A line number in the error message helps you locate the error in the program text and correct it. However, if an invisible character causes an error message, it is very confusing since you cannot see where the 28 Sec.

INTERMEDIATE CODE 29 problem is by looking at the printed program text. To avoid this situation we must make sure that ‘“‘what you see is what the compiler gets.” In other words, the compiler must ignore all control characters (except NL and. ETX). The set of invisible characters is stored in a variable of type CharSet: type CharSet = set of char; var Invisible: CharSet; The scanner uses Algorithm 4.1 to input the next character and store it ina global variable named ch. This procedure illustrates the use of set types in a compiler. Later, we will see many other examples of this. procedure NextChar; begin if EOF then ch := ETX else begin Read(ch); if ch in Invisible then NextChar end end;

Algorithm INTERMEDIATE CODE In the source text, a word symbol such as const consists of a sequence of letters, in this case c o n s t. This representation of symbols is very bulky and awkward to work with in Pascal. If you store a symbol as a string of characters var Symbol: String; you can easily test, for example, if the symbol is the word const: if Symbol = ’const’ then… But you cannot easily form sets of strings and ask whether the current symbol is either const, type, var, or procedure. The use of symbol sets plays an important role in syntax analysis. To be able to define symbol sets, we must represent the symbols by simple values. These values should, however, have readable names to make the compiler understandable. What we need, then, is an enumerated type that introduces a name for each symbol in the language:

Linear Searching To illustrate how important efficient searching is in lexical analysis, very inefficient method called linear searching. The idea behind it is the simplest possible: The symbol table is organized as a linked list of records each describing one word. The list is unordered and is always searched from the beginning. New words can be inserted anywhere in the list.

Letter Indexing Well then, let’s improve things a bit by using a ‘‘thumb index’ of the words as in a dictionary. Instead of a single linked list we will use a separate list for each letter of the alphabet The first list will hold all words beginning with the letter “A”. The second list is for the letter ““B’’, and so on. The first letter of each word is used as an index in an array to locate the start of the corresponding list, Each list is unordered.

COMPILER

The Pascal system consists of the Pascal compiler and a library containing the Pascal run-time environment. A Pascal program is run by compiling its one or more source modules, linking the resulting object files with the Pascal library using the Linker, and invoking the resulting run file, which is usually done through the Executive. The Pascal compiler translates your Pascal source programs into object modules.

The compiler provides a source listing, error messages, and a number of compiler metacommands to aid in program development and debugging. The compiler generates native 8086 machine code, which is directly executed by the hardware. LANGUAGE LEVELS This version of Pascal offers two language levels: o o The standard level is limited to features that conform to the ISO standard. Programs you create at this level are portable to and from other machines running other ISO compatible Pascal compilers. The extend level includes features specific to our versioi1C)”f Pascal.

Programs that use extend level features may not be portable. Whenever extend level features are discussed in this manual the words extend level appear in boldface type to make them easy for you to locate. THE COMPILER METACOMMANDS The Pascal meta commands provide a control language for the compiler. They specify options that affect the overall operation of a compilation. For example, you can conditionally compile different source files, generate a listing file, or enable or disable run-time error checking code. Metacommands are inserted inside comment statements. All the metacommands begin with a dollar sign ($).

LOAD A ADDB STORE C

This causes the number in A to be added to the number in B and the result to be stored for later use in C. This computer language is an assembly language, which is generally referred to as a low-level language. What actually happens is that words and symbols are translated into appropriate binary digits and the machine uses the translated form.

Although assembly language is an improvement on machine language for readability and program development, it is still a bit cumbersome. Consequently, many high-level languages have been developed; these include Pascal, PL/1, FORTRAN, BASIC, COBOL, C, Ada, Modula-2, Logo, 1.3 Computer Languages 11 and others.

These languages simplify even further the terminology and symbolism necessary for directing the machine to perform various manipulations of data. For example, the task of adding two integers would be written as

C := A + B;

C = A + B;

C = A + B

C = A + B ADD A.B GIVING C C = A + B;

C:=A + B;

C := A + B; MAKE “C :A + :B (Pascal) (PL/I) (FORTRAN) (BASIC) (COBOL) (C) (Ada) (Modula-2) (Logo) A high-level language makes it easier to read, write, and understand a program. This book develops the concepts, symbolism, and terminology necessary for using Pascal as a programming language for solving problems. After you have become proficient in using Pascal, you should find it relatively easy to learn the nuances of other high-level languages. For a moment, let’s consider how an instruction such as C := A + B;

gets translated into machine code. The actual bit pattern for this code varies according to the machine and software version, but it could be as follows: 0100001lOOlllOJOOOll1101010000010010101101000010 In order for this to happen, a special program called a compiler “reads” the high-level instructions and translates them into machine code. This compiled version is then run using some appropriate data. The results are then presented through some form of output device. The special pro grams that activate the compiler, run the machine-code version, and cause output to be printed are system programs. The program you write is a source program, and the machine-code version is an object program (also referred to as object code). As you will soon see, the compiler does more than just translate instructions into machine code. It also detects certain errors in your source program and prints appropriate messages.

For example, if you write the instruction C := (A + B; where a parenthesis is missing, when the compiler attempts to translate this line into machine code, it will detect that “)” is needed to close the parenthetical expression. It will then give you an error message such as ERROR IN VARIABLE You will then need to correct the error (and any others) and recompile your source program before running it with the data. Before leaving this introductory chapter, let’s consider the question: Why study Pascal? Various languages have differing strengths and weaknesses. Pascal’s strong features include the following

Conclusion

WHAT A COMPILER DOES

The only reasonable thing to do is to report this and stop the compilation. This is called a compilation failure. If the compiler finds no formal errors in a program, it proceeds to the last phase of compilation, code generation. In this phase, the compiler determines the amount of storage needed for the code and variables of the program and emits final instructions. The main difficulty is that most computers have very unsystematic instruction sets that are ill suited for automatic code generation. These, then, are the major tasks of a compiler:

Lexical analysis

The part of a compiler that performs lexical analysis is called the scanner.Pascal— scanner (pass 1) and explains how it recognizes symbols in a program text and replaces them by a more convenient representation. Much of the discussion concerns the pro’ tem of searching for word symbols and names in a symbol table.

SOURCE TEXT The scanner reads a program text character by character. It uses a control character to indicate that it has reached the end of the source text

const ETX = 38C; This is the Pascal

x notation for ASCII character number three, known as ETX (End of Text).

When the scanner finds a character that cannot occur in any symbol, it reports the problem. A line number in the error message helps you locate the error in the program text and correct it.

Syntax analysis

A program that performs syntax analysis is called a parser. (The word parse means “‘to describe a series of words grammatically.”) This chapter de scribes the Pascal— parser (pass 2) and explains how it recognizes correct sentences and detects syntax errors. The difficult problem of error recovery is discussed in detail. The method of compilation, which is known as recursive descent, imposes certain restrictions on the grammar of the language. 5.1. SYMBOLINPUT When the scanner has converted a program text into a sequence of symbols, the parser performs a single scan of the symbols and checks whether they form Pascal sentences. In this chapter we assume that the only task of pass 2 is to input symbols and output error messages if the program syntax is wrong. Semantic analysis and code generation will be discussed later. The compiler passes use two procedures named Read and Emit to input and output intermediate (and final) code.

Syntax Analysis Parser: The parser has two functions: 1) It checks that the tokens appearing in its input, which is the output of the lexical analyzer, occur in patterns that are permitted by the specification for the source language. 2) It also imposes on the tokens a tree-like structure that is used by the subsequent phases of the compiler.

Scope analysis

A Pascal— program uses names to refer to constants, data types, record fields, variables, and procedures. These named entities are called the objects of the program. The types integer and Boolean, the constants false and true, and the procedures read and write are predefined objects that may be used in any Pascal— program. They are called the standard objects of the language. Any other object used in a program must be defined by a definition that introduces the name of the object and describes some of its properties.

A constant definition introduces a constant; for example, const c = 10;

A type definition introduces a data type; for example, type T = array [1 ..c] of integer;

A record type introduces one or more fields; for example,

type U = record f:

Tg: Boolean end;

A variable definition introduces one or morevariables; for example, varx,y: T: A procedure definition defines a procedure; for example, procedure P(x, y: T); begin .. , end; This example includes definitions of two local variables x and y which are parameters of the procedure.

A program combines related definitions and statements into syntactic units called blocks. There are three kinds of blocks: (1) Thestandard block (2) Programs (3) Procedures

Type analysis

To perform type analysis, the compiler must, however, be able to dis tinguish between different kinds of objects: constants, types, variables, pro cedures, and so on. Wewill use the following data typeto classify objects

Code generation

“Pascal code generator compiler” refers to a compiler that takes source code written in the Pascal programming language and translates it into a lower-level form, typically machine code or an intermediate representation, that can be executed by a computer. The “code generation” phase is a crucial part of this process, where the compiler transforms the intermediate representation into the final target code.

http://pascal.hansotten.com/uploads/pbh/brinch%20hansen%20on%20pascal%20compilers%20OCR.pdf

References:

http://pascal.hansotten.com/uploads/pbh/brinch%20hansen%20on%20pascal%20compilers%20OCR.pdf

Write UP on Tech Geek History: Pascal Complier (revision 3)

Leave a Comment Cancel Reply