Composite Types

Prev: Type Systems Next: Subroutines and Control Abstraction

Prev: Type Systems Next: Subroutines and Control Abstraction

8.1 Check Your Understanding

  1. What are struct tags in C? How are they related to type names? How did they change in C++?
  2. How do the records of ML differ from those of most other languages?
  3. Discuss the significance of “holes” in records. Why do they arise? What problems do they cause?
  4. Why is it easier to implement assignment than comparison for records?
  5. What is packing? What are its advantages and disadvantages?
  6. Why might a compiler reorder the fields of a record? What problems might this cause?
  7. Briefly describe two purposes for unions/variant records.

8.2 Check Your Understanding

  1. What is an array slice? For what purposes are slices useful?
  2. Is there any significant difference between a two-dimensional array and an array of one-dimensional arrays?
  3. What is the shape of an array?
  4. What is a dope vector? What purpose does it serve?
  5. Under what circumstances can an array declared within a subroutine be allocated in the stack? Under what circumstances must it be allocated in the heap?
  6. What is a conformant array?
  7. Discuss the comparative advantages of contiguous and row-pointer layout for arrays.
  8. Explain the difference between row-major and column-major layout for contiguously allocated arrays. Why does a programmer need to know which layout the compiler uses? Why do most language designers consider row-major layout to be better?
  9. How much of the work of computing the address of an element of an array can be performed at compile time? How much must be performed at run time?

8.3-8.5.1 Check Your Understanding

  1. Name three languages that provide particularly extensive support for character strings.
  2. Why might a language permit operations on strings that it does not provide for arrays?
  3. What are the strengths and weaknesses of the bit-vector representation for sets? How else might sets be implemented?
  4. Discuss the tradeoffs between pointers and the recursive types that arise naturally in a language with a reference model of variables.
  5. Summarize the ways in which one dereferences a pointer in various programming languages.
  6. What is the difference between a pointer and an address? Between a pointer and a reference?
  7. Discuss the advantages and disadvantages of the interoperability of pointers and arrays in C.
  8. Under what circumstances must the bounds of a C array be specified in its declaration?

8.5.2-8.5.3 Check Your Understanding

  1. What are dangling references? How are they created, and why are they a problem?
  2. What is garbage? How is it created, and why is it a problem? Discuss the comparative advantages of reference counts and tracing collection as a means of solving the problem.
  3. What are smart pointers? What purpose do they serve?
  4. Summarize the differences among mark-and-sweep, stop-and-copy, and generational garbage collection.
  5. What is pointer reversal? What problem does it address?
  6. What is “conservative” garbage collection? How does it work?
  7. Do dangling references and garbage ever arise in the same programming language? Why or why not?
  8. Why was automatic garbage collection so slow to be adopted by imperative programming languages?
  9. What are the advantages and disadvantages of allowing pointers to refer to objects that do not lie in the heap?

8.6-8.7 Check Your Understanding

  1. Why are lists so heavily used in functional programming languages?
  2. What are list comprehensions? What languages support them?
  3. Compare and contrast the support for lists in ML- and Lisp-family languages.
  4. Explain the distinction between interactive and file-based I/O; between temporary and persistent files; and between binary and text files.
  5. What are some of the tradeoffs between supporting I/O in the language proper versus supporting it in libraries?

8.9 Exercises

  1. Suppose we are compiling for a machine with 1-byte chars, 2-byte shorts, 4-byte ints, and 8-byte reals, with alignment rules that require every primitive datum to begin at an address that is an even multiple of its size. Suppose further that the compiler is not permitted to reorder fields. How much space will be consumed by the following array? Explain.
A : array [0..9] of record
       s : short
       c : char
       t : short
       d : char
       r : real
       i : integer
  1. In Example 8.10 we suggested sorting record fields by alignment requirement to minimize holes. There we sorted smallest-alignment-first. What would happen if we sorted longest-alignment-first? Do you see any advantages or disadvantages? If the record as a whole must be an even multiple of the longest alignment, do the two approaches ever differ in total space required?
  2. Give Ada code to map from lowercase to uppercase letters using: (a) an array; (b) a function. Note the similarity of syntax: in both cases upper('a') is A.
  3. In Section 8.2.2 we noted that in a language with dynamic arrays and a value model of variables, records could have fields whose size is not known at compile time. To accommodate these, we suggested using a dope vector for the record, to track the offsets of the fields. Suppose instead that we want to maintain a static offset for each field. Can we devise an alternative strategy inspired by the stack-frame layout of Figure 8.7, and divide each record into a fixed-size part and a variable-size part? What problems would we need to address? Hint: consider nested records.
  4. Explain how to extend Figure 8.7 to accommodate subroutine arguments that are passed by value, but whose shape is not known until the subroutine is called at run time.
  5. Explain how to obtain the effect of Fortran 90’s allocate statement for one-dimensional arrays using pointers in C. You will probably find that your solution does not generalize to multidimensional arrays. Why not? If you are familiar with C++, show how to use its class facilities to solve the problem.
  6. Example 8.24, which considered the layout of a two-dimensional array of characters, counted only the space devoted to characters and pointers. That is appropriate if the space is allocated statically, as a global array of days or keywords known at compile time. Suppose instead that space is allocated in the heap, with 4 or 8 bytes of overhead for each contiguous block of storage. How does this change the tradeoffs in space efficiency?
  7. Consider the array indexing calculation of Example 8.25. Suppose that i, j, and k are already loaded into registers, and that A’s elements are integers, allocated contiguously in memory on a 32-bit machine. Show, in the pseudo-assembly notation of Sidebar 5.1, the instruction sequence to load A[i, j, k] into a register. You may assume the existence of an indexed addressing mode capable of scaling by small powers of two. Assuming the final memory load is a cache hit, how many cycles is your code likely to require on a modern processor?
  8. Continuing the previous exercise, suppose that A has row-pointer layout, and that i, j, and k are again available in registers. Show pseudo-assembly code to load A[i, j, k] into a register. Assuming that all memory loads are cache hits, how many cycles is your code likely to require on a modern processor?
  9. Repeat the preceding two exercises, modifying your code to include run-time checking of array subscript bounds.
  10. In Section 8.2.3 we discussed how to differentiate between the constant and variable portions of an array reference, in order to efficiently access the subparts of array and record objects. An alternative approach is to generate naive code and count on the compiler’s code improver to find the constant portions, group them together, and calculate them at compile time. Discuss the advantages and disadvantages of each approach.
  11. Consider the following C declaration, compiled on a 64-bit x86 machine. If the address of A[0][0] is 1000 decimal, what is the address of A[3][7]?
struct {
    int n;
    char c;
} A[10][10];
  1. Suppose we are generating code for an imperative language on a machine with 8-byte floating-point numbers, 4-byte integers, 1-byte characters, and 4-byte alignment for both integers and floating-point numbers. Suppose further that we plan to use contiguous row-major layout for multidimensional arrays, that we do not wish to reorder fields of records or pack either records or arrays, and that we will assume without checking that all array subscripts are in bounds.
A : array [1..10, 10..100] of real
i : integer
x : real

Show the code that our compiler should generate for the assignment x := A[3, i]. Explain how you arrived at your answer.

r : record
       x : integer
       y : char
       A : array [1..10, 10..20] of record
             z : real
             B : array [0..71] of char
j, k : integer

Assume that these declarations are local to the current subroutine. Note the lower bounds on indices in A; the first element is A[1,10]. Describe how r would be laid out in memory. Then show code to load r.A[2, j].B[k] into a register. Be sure to indicate which portions of the address calculation could be performed at compile time. 14. Suppose A is a 10x10 array of 4-byte integers, indexed from [0][0] through [9][9]. Suppose further that the address of A is currently in register r1, the value of integer i is currently in register r2, and the value of integer j is currently in register r3. Give pseudo-assembly language for a code sequence that will load A[i][j] into register r1: (a) assuming that A is implemented using row-major contiguous allocation; (b) assuming that A is implemented using row pointers. Each line of your pseudocode should correspond to a single instruction on a typical modern machine. You may use as many registers as you need. You need not preserve the values in r1, r2, and r3. You may assume that i and j are in bounds, and that addresses are 4 bytes long. Which code sequence is likely to be faster, and why? 15. Pointers and recursive type definitions complicate the algorithm for determining structural equivalence of types. Consider:

type A = record
    x : pointer to B
    y : real
 
type B = record
    x : pointer to A
    y : real

The simple “expand recursively until only built-in types and constructors remain” definition from Section 7.2.1 does not work because the expansion is infinite. Reinterpret structural equivalence so that two types A and B are equivalent if any sequence of field selections, array subscripts, pointer dereferences, and other operations that descends into A and ends at a built-in type always encounters the same field names and ends at the same built-in type when used to descend into B, and vice versa. Under this reinterpretation, A and B above have the same type. Give an algorithm based on this reinterpretation that could be used in a compiler to determine structural equivalence. Hint: the fastest approach is due to J. Kral and is based on the algorithm used to find the smallest deterministic finite automaton that accepts a given regular language. 16. Explain the meaning of the following C declarations:

double *a[n];
double (*b)[n];
double (*c[n])();
double (*d())[n];
  1. In Ada 83, pointers access variables can point only to objects in the heap. Ada 95 allows a new kind of pointer, the access all type, to point to other objects as well, provided that those objects have been declared to be aliased:
type int_ptr is access all Integer;
foo : aliased Integer;
ip : int_ptr;
...
ip := foo'Access;

The Access attribute is roughly equivalent to C’s address-of operator. How would you implement access all types and aliased objects? How would your implementation interact with automatic garbage collection, assuming it exists, for objects in the heap? 18. As noted in Section 8.5.2, Ada 95 forbids an access all pointer from referring to any object whose lifetime is briefer than that of the pointer’s type. Can this rule be enforced completely at compile time? Why or why not? 19. In much of the discussion of pointers in Section 8.5, we assumed implicitly that every pointer into the heap points to the beginning of a dynamically allocated block of storage. In some languages, including Algol 68 and C, pointers may also point to data inside a block in the heap. If you were trying to implement dynamic semantic checks for dangling references or, alternatively, automatic garbage collection, precise or conservative, how would your task be complicated by the existence of such internal pointers? 20. Occasionally one encounters the suggestion that a garbage-collected language should provide a delete operation as an optimization: by explicitly deleting objects that will never be used again, the programmer might save the garbage collector the trouble of finding and reclaiming those objects automatically, thereby improving performance. What do you think of this suggestion? Alternatively, one might allow the programmer to tenure an object so that it will never be a candidate for reclamation. Is this a good idea? 21. In Example 8.52 we noted that functional languages can safely use reference counts since the lack of an assignment statement prevents them from introducing circularity. This is not strictly true; constructs like Lisp letrec can also be used to make cycles, so long as uses of circularly defined names are hidden inside lambda expressions in each definition:

(define foo
  (lambda ()
    (letrec ((a (lambda(f) (if f #\\A b)))
             (b (lambda(f) (if f #\\B c)))
             (c (lambda(f) (if f #\\C a))))
      a)))

Each of the functions a, b, and c contains a reference to the next:

((foo) #t)                       ==> #\\A
(((foo) #f) #t)                  ==> #\\B
((((foo) #f) #f) #t)             ==> #\\C
(((((foo) #f) #f) #f) #t)        ==> #\\A

How might you address this circularity without giving up on reference counts? 22. Here is a skeleton for the standard quicksort algorithm in Haskell:

quicksort [] = []
quicksort (a : l) = quicksort [...] ++ [a] ++ quicksort [...]

The ++ operator denotes list concatenation. The : operator is equivalent to ML’s :: or Lisp’s cons. Show how to express the two elided expressions as list comprehensions. 23. 8.23-8.31 In More Depth.