Unit 3
1) What is intermediate code? write benefits of intermediate code generation.
Intermediate code
In the analysis phase, the source program is translated into an intermediate representation. Many forms of intermediate code are possible. These form should be easy to produce and easy to translate into the target program.
Benefits of intermediate code generation
- Ease of Optimization: Intermediate code can be optimized to improve performance before the final machine code is generated. This optimization is machine-independent, making the compiler more flexible.
- Machine Independence: Since the intermediate code is not specific to any particular machine architecture, the front-end of the compiler (which generates the intermediate code) can be separated from the back-end (which generates the machine code). This allows for easier retargeting of the compiler to different platforms.
- Modularity: Using an intermediate representation enhances the modularity of a compiler. Different front-ends for various source languages can generate the same intermediate code, and different back-ends can translate the intermediate code into machine code for different target machines.
- Abstraction: Intermediate representations often provide a level of abstraction that hides the details of the target machine from the front-end of the compiler.
2) Explain different forms of intermediate code generation.
1. Abstract syntax tree
2. Polish notation
3. Three address code
Intermediate code serves as a representation of the source program that lies between the high-level source code and the low-level machine code. It offers benefits like machine independence, facilitates optimization, and enhances modularity in the compilation process. Here are some common forms:
-
Abstract Syntax Tree (AST)
- An AST is a tree representation of the abstract syntactic structure of the source code.
- Each node in the tree denotes a construct occurring in the source code.
- The leaves of the tree represent the operands.
- Example: The expression
a + b * c
would have an AST with ā+ā at the root, āaā as the left child, and ā*ā as the right child. The ā*ā node would have ābā and ācā as its children. - ASTs are used extensively in compilers because they represent the hierarchical structure of the program in a clear and unambiguous way.
-
Polish Notation (Prefix and Postfix)
- Polish notation is a notation for writing expressions without using parentheses.
- Prefix Notation (Polish Notation): Operators precede their operands. Example:
+ a * b c
- Postfix Notation (Reverse Polish Notation): Operators follow their operands. Example:
a b c * +
- Postfix notation is particularly useful in compilers for expression evaluation because it can be easily evaluated using a stack-based algorithm.
-
Three-Address Code
- Three-address code is a sequence of statements of the form
A := B op C
, where A, B, and C are either variable names, constants, or temporary variables. op
represents any operator.- Each statement involves at most one operator.
- Temporary variables are used to hold intermediate values computed during the evaluation.
- Example: The expression
a + b * c
can be translated into the following three-address code:T1 := b * c T2 := a + T1
- Three-address code is a common intermediate representation because it is simple, and it makes it easier to perform optimizations.
- Three-address code is a sequence of statements of the form
3) Explain Quadruple, triple, and indirect triple with a suitable example.
OR
Write about implementation of 3 address code
Three-Address Code Implementation
Three-address code is an intermediate representation where each instruction typically has at most three operands. These operands can be variables, constants, or temporary variables. To implement three-address code, we need data structures to store and manipulate these instructions. Quadruples, triples, and indirect triples are different ways to represent these three-address instructions.
1. Quadruples
- A quadruple is a record structure with four fields:
op
,arg1
,arg2
, andresult
.op
: Holds an operator.arg1
andarg2
: Hold the two operands.result
: Holds the result of the operation.
- Example: The three-address code
a = b + c
would be represented by the quadruple(+, b, c, a)
. - Quadruples are easy to understand and implement.
2. Triples
- A triple is a record structure with three fields:
op
,arg1
, andarg2
.op
: Holds an operator.arg1
andarg2
: Hold the two operands.
- The results of the operations are referred to by their position.
- Example: The three-address code:
would be represented by the triples:
T1 = b + c a = T1
(+, b, c) (=, a, (0))
(0)
refers to the result of the first triple.
- Triples conserve storage compared to quadruples because the temporary names are not explicitly stored.
3. Indirect Triples
- Indirect triples use a listing of pointers to triples.
- An additional table stores the list of pointers to the triples.
- Example:
- Instead of storing the triples directly, we store pointers to them.
- This allows for more efficient code optimization.
- Indirect triples allow for more flexibility and can lead to better optimization.
4) Write a short note on symbol table management.
Here is a short note on symbol table management:
A symbol table is a crucial data structure used by a compiler to store information about the various entities in a program, such as variables, functions, classes, and interfaces. It is used in all phases of compilation.
The purpose of symbol table management is to organize and maintain these identifiers and their associated attributes.
Key operations in symbol table management include:
- Insertion: Adding new symbols and their attributes to the table. This occurs during lexical analysis and parsing when identifiers are encountered.
- Lookup: Searching for a symbol in the table to retrieve its attributes. This is essential during semantic analysis and code generation to verify the correct usage of identifiers.
- Deletion: Removing symbols from the table, typically when they go out of scope.
Attributes stored in the symbol table might include:
- Name: The identifier itself.
- Type: The data type of a variable or the return type of a function.
- Scope: Information about where the identifier is valid and accessible.
- Address: The memory location assigned to a variable.
- Other information: Such as size, storage class, and parameters (for functions).
Efficient symbol table management is vital for the performance and correctness of a compiler.
5) Explain how type checking & error reporting is performed in the compiler.
Type checking is a crucial phase in the compilation process that ensures the program adheres to the type rules of the programming language. It verifies that operations are performed on compatible data types and that data is used consistently throughout the program. Error reporting is closely tied to type checking, as the compiler must provide informative messages to the programmer when type errors are detected.
Hereās how type checking and error reporting are generally performed in a compiler:
Type Checking
-
Symbol Table: The symbol table, constructed during lexical and syntax analysis, plays a vital role. It stores information about the type of each identifier (variable, function, etc.) in the program.
-
Type Rules: Every programming language has a set of type rules that define how types can be combined and manipulated. For example, an addition operation might require both operands to be numeric types.
-
Traversal of the Abstract Syntax Tree (AST): The type checker traverses the AST, which represents the programās structure. At each node of the AST, the type checker applies the type rules of the language.
-
Type Inference: In some cases, the compiler can infer the type of an expression without explicit type declarations (e.g., in languages with type inference features). It uses the context in which an expression appears to deduce its type.
-
Type Checking of Expressions:
- For operators (e.g., +, -, *, /), the type checker verifies that the operands have the expected types. If not, a type error is reported.
- For function calls, the type checker checks that the number of arguments matches the functionās definition and that the argument types are compatible with the parameter types.
- For assignments, the type checker ensures that the type of the right-hand side expression is compatible with the type of the variable on the left-hand side.
-
Type Checking of Control Flow: The type checker also examines control flow statements (e.g., if, while) to ensure that the conditions are of a suitable type (e.g., boolean).
Error Reporting
-
Detection of Type Errors: When the type checker encounters a violation of the type rules, it detects a type error.
-
Error Message Generation: The compiler generates an error message that informs the programmer about the type error. The error message typically includes:
- The location of the error in the source code (e.g., line number, column number).
- A description of the error (e.g., ātype mismatch,ā āinvalid operand typeā).
- Information about the expected and actual types.
-
Error Reporting Mechanisms:
- The compiler may report errors as soon as they are detected during the traversal of the AST.
- Some compilers collect all the errors and report them at the end of the type-checking phase.
-
Error Recovery: After reporting an error, the compiler may attempt to recover and continue type checking. This allows the compiler to detect multiple errors in a single compilation. Error recovery strategies might involve:
- Substituting a default type for the erroneous expression.
- Skipping the rest of the current statement or block.
Example
Consider the following code snippet:
int x;
float y;
x = y + "hello";
- The type checker would access the symbol table to determine that
x
is of typeint
andy
is of typefloat
. - When checking the assignment statement, it would encounter the expression
y + "hello"
. - The type checker would determine that the
+
operator is being applied to afloat
(y
) and astring
("hello"
), which is a type mismatch according to the languageās type rules. - The compiler would generate an error message similar to: āError: Type mismatch. Cannot add float and string.ā
- The error message would include the line number where the error occurred.
6) List out various syntax directed translation Mechanisms. Explain any two with an example.
Hereās a breakdown of syntax-directed translation mechanisms, along with explanations of two key methods:
Syntax-Directed Translation
Syntax-directed translation is a compiler construction technique where the semantic analysis phase is driven by the syntax of the programming language. It involves attaching rules or actions to the grammar productions of a languageās context-free grammar. These rules specify how to compute attributes or perform actions based on the syntactic structure of the program.
Various Syntax-Directed Translation Mechanisms
- Syntax-Directed Definitions (SDDs):
- SDDs associate attributes with grammar symbols and semantic rules with grammar productions.
- Attributes can be of any type (e.g., strings, numbers, data structures).
- Semantic rules specify how to compute the values of attributes.
- SDDs do not specify the order in which the attributes are computed.
- Translation Schemes:
- Translation schemes are similar to SDDs, but they provide explicit information about the order in which semantic actions are to be performed.
- Semantic actions (code fragments) are embedded within the grammar productions.
- Translation schemes are often used in parser generators.
- Attribute Grammars:
- Attribute grammars are a formal framework for defining SDDs.
- They provide a more structured and rigorous way to specify attribute computations.
- Attribute grammars distinguish between synthesized attributes (computed from children nodes) and inherited attributes (computed from parent and/or sibling nodes).
Explanation of Two Mechanisms
-
Syntax-Directed Definitions (SDDs)
-
An SDD associates attributes with each grammar symbol and semantic rules with each production.
-
Attributes can be synthesized or inherited.
- Synthesized attributes: Their values are computed from the values of the attributes of their children in the parse tree.
- Inherited attributes: Their values are computed from the values of the attributes of the parent and/or siblings in the parse tree.
-
Example: Consider the following grammar for simple arithmetic expressions:
E -> E + T E -> T T -> T * F T -> F F -> ( E ) F -> id
-
We can define an SDD to compute the value of each expression. Letās use a synthesized attribute
val
for each non-terminal. -
SDD:
Production Semantic Rule E -> E1 + T E.val = E1.val + T.val E -> T E.val = T.val T -> T1 * F T.val = T1.val * F.val T -> F T.val = F.val F -> ( E ) F.val = E.val F -> id F.val = id.lexval id.lexval
is the value of the token id (obtained from the lexical analyzer).
-
In this example, the
val
attribute for each expression is computed based on the values of its sub-expressions.
-
-
Translation Schemes
-
Translation schemes are a procedural way of specifying syntax-directed translation. Semantic actions are embedded within the grammar productions.
-
Example: Using the same grammar as above, we can write a translation scheme to perform the same computations:
E -> E + T { E.val = E.val + T.val } E -> T { E.val = T.val } T -> T * F { T.val = T.val * F.val } T -> F { E.val = F.val } F -> ( E ) { F.val = E.val } F -> id { F.val = id.lexval }
-
The actions within the curly braces
{}
are performed in the order they appear during parsing. -
For instance, in the production
E -> E + T { E.val = E.val + T.val }
, the addition is performed after the values ofE.val
andT.val
have been computed (presumably through recursive calls).
-
Key Differences Summarized
- SDDs are declarative: They specify what computations need to be done, without specifying the order.
- Translation Schemes are procedural: They specify how the computations are to be done, including the order of evaluation.
7) Explain Activation record and Activation tree in brief.
Certainly! Hereās a brief and easy-to-understand note on Activation Record and Activation Tree, as used in compiler design and run-time environments:
Activation Record
Definition:
An activation record (also called a stack frame) is a block of memory created on the stack whenever a function or procedure is called. It stores all the information needed to manage a single execution of that function.
Main contents of an activation record:
- Return address: Where to return after the function finishes.
- Parameters: Values passed to the function.
- Local variables: Variables declared inside the function.
- Temporary data: For intermediate calculations.
- Saved registers: To restore previous values after the function call.
- Control links: To connect to the previous activation record (for nested or recursive calls).
Example:
When you call sum(5, 10)
, an activation record for sum
is pushed onto the stack, holding the values 5
and 10
, space for local variables, and the return address.
Activation Tree
Definition:
An activation tree is a diagram that shows the calling relationships between different function activations (calls) during program execution. Each node represents an activation (a function call), and edges show which function called which.
Features:
- The root is usually the main program or the first function called.
- Child nodes represent functions called by their parent function.
- The tree shows the order and nesting of function calls.
Example:
If main()
calls A()
, and A()
calls B()
and C()
, the activation tree looks like:
main
|
A
/ \
B C
Usefulness:
- Helps visualize the flow of function calls.
- Useful for understanding recursion and nested calls.
- Shows how activation records are pushed and popped from the stack during execution.
In summary:
- Activation record: Memory block for managing a function call (stores parameters, locals, return address, etc.).
- Activation tree: Diagram showing the structure and order of function calls during program execution.
These concepts are essential for understanding how compilers manage function calls and memory at runtime!
8) Explain Stack allocation and Activation record organization in brief.
Here are brief explanations of stack allocation and activation record organization, drawing from the provided compiler design resources:
Stack Allocation
Stack allocation is a memory management technique used to allocate and deallocate memory for function calls and local variables during the execution of a program. It operates on a Last-In, First-Out (LIFO) principle.
- When a function is called, a block of memory called an activation record (or stack frame) is pushed onto the stack.
- This activation record contains space for the functionās parameters, local variables, return address, and other necessary information.
- When the function completes its execution, its activation record is popped off the stack, freeing the memory.
- Stack allocation is efficient for managing function calls and local variables because the allocation and deallocation operations are simple and fast.
Activation Record Organization
An activation record is a data structure that stores all the information needed for a single execution of a procedure. The contents of an activation record vary depending on the programming language and the compiler implementation, but it typically includes the following:
- Temporary variables: Space for storing temporary values computed during the execution of the procedure.
- Local variables: Space for storing the local variables of the procedure.
- Saved machine registers: Space to save the values of machine registers before the procedure call, so they can be restored after the procedure returns.
- Control link (Dynamic link): A pointer to the activation record of the caller procedure. This is used to restore the callerās environment when the called procedure returns.
- Access link (Static link): In languages that support nested procedures, a pointer to the activation record of the statically enclosing procedure. This is used to access non-local variables.
- Actual parameters: Space to store the actual parameters passed to the procedure.
- Return value: Space to store the return value of the procedure.
- Return address: The address of the instruction to which execution should return after the procedure call.
9) List out and explain various storage allocation strategies.
Hereās a list of storage allocation strategies:
- Static Allocation:
- Memory is allocated to variables and data structures at compile time.
- The size and location of the allocated memory are fixed throughout the programās execution.
- Examples: Global variables, static variables.
- Stack Allocation:
- Memory is allocated and deallocated in a Last-In, First-Out (LIFO) manner.
- Used for managing function calls and local variables.
- Activation records (or stack frames) are pushed onto the stack when a function is called and popped off when the function returns.
- Heap Allocation:
- Memory is allocated and deallocated dynamically at run time.
- Used for data structures whose size is not known at compile time or that need to persist beyond the lifetime of a function call.
- Memory is allocated from a pool of available memory (the heap) using functions like
malloc()
(in C/C++) ornew
(in C++, Java, etc.).
10) Write in brief about storage organization in compiler.
Hereās a brief explanation of stack allocation and activation record organization:
Stack Allocation
In computer science, stack allocation is a memory allocation technique where memory is allocated and deallocated in a last-in, first-out (LIFO) manner. [cite: 187, 569] Itās like a stack of plates ā the last plate placed on top is the first one removed. This method is efficient for managing memory in functions calls and local variables.
Activation Record Organization
An activation record, also known as a stack frame, is a data structure that stores information about a specific invocation of a procedure or function. [cite: 1088, 1121, 1122] It typically contains:
- Return address
- Call arguments
- Saved registers
- Local variables
- Temporary storage [cite: 1122, 1123, 122]
The activation record is pushed onto the stack when a function is called and popped off when the function completes its execution. [cite: 196, 536, 537, 538, 539, 540] This Ł ŁŲøŁ organization helps in managing the execution of functions and their data in an organized way.
11) Compare and contrast between static , stack and heap allocation.
Hereās a comparison of static, stack, and heap allocation:
- Static Allocation:
- Memory is allocated to variables and data structures at compile time.
- The size and location of the allocated memory are fixed throughout the programās execution.
- Examples: Global variables, static variables.
- Stack Allocation:
- Memory is allocated and deallocated in a Last-In, First-Out (LIFO) manner.
- Used for managing function calls and local variables.
- Activation records (or stack frames) are pushed onto the stack when a function is called and popped off when the function returns.
- Heap Allocation:
- Memory is allocated and deallocated dynamically at run time.
- Used for data structures whose size is not known at compile time or that need to persist beyond the lifetime of a function call.
- Memory is allocated from a pool of available memory (the heap) using functions like malloc() (in C/C++) or new (in C++, Java, etc.).
12) What is the use of a symbol table? How are identifiers stored in the symbol table?
Hereās what the documents say about the use of a symbol table and how identifiers are stored:
Use of a Symbol Table
The documents āIntroduction to Automata Theory, Languages, and Computation (John Hopcroft, Rajeev Motwani, Jeffrey Ullman) (Z-Library).pdfā and āCompilers Principles and Practice (Parag H. Dave, Himanshu B. Dave) (Z-Library).pdfā explain that a symbol table is a crucial data structure used by a compiler. It stores information about various entities in a program, such as:
- Variables
- Functions
- Classes
- Interfaces
The symbol table is used throughout the compilation process for tasks like:
- Type checking: Verifying that variables are used consistently with their declared types.
- Code generation: Storing information needed to allocate memory and generate correct instructions.
- Scope management: Keeping track of the visibility and accessibility of identifiers in different parts of the program.
How Identifiers Are Stored
The documents explain that the symbol table stores various attributes associated with each identifier. These attributes can include:
- Name: The identifier itself.
- Type: The data type of a variable or the return type of a function.
- Scope: Information about where the identifier is valid and accessible.
- Memory location: The address assigned to a variable.
- Other information: Such as size, storage class, and parameters (for functions).
The way this information is stored can vary depending on the specific implementation of the symbol table. Common techniques include using:
- Hash tables: For efficient lookup of identifiers.
- Linked lists: To handle collisions in hash tables or to maintain the order of declarations.
- Tree structures: For more complex scope management.
13) Explain dynamic storage allocation techniques.
Here are some key points regarding storage organization in a compiler:
- Compilerās Role: The compiler plays a crucial role in organizing how memory is used during program execution. This involves allocating space for variables, data structures, and managing function calls.
- Memory Management: Efficient storage organization is essential for optimizing program performance and ensuring correct execution.
- Key Storage Areas:
- Static Allocation: Memory is allocated at compile time, and the size is fixed throughout the programās execution.
- Stack Allocation: Memory is allocated and deallocated in a LIFO manner, primarily used for function calls and local variables.
- Heap Allocation: Memory is dynamically allocated and deallocated during program execution, providing flexibility for data structures with varying sizes.
- Activation Records: These are used to manage information related to function calls, including local variables, parameters, and return addresses.
- Symbol Table: The compiler uses a symbol table to store information about identifiers (variables, functions, etc.) and their attributes (type, scope, etc.).