· 7 years ago · Feb 23, 2019, 01:36 PM
1Chapter 2: Building Abstractions with Objects
2Contents
32.1 Introduction
42.1.1 The Object Metaphor
52.1.2 Native Data Types
62.2 Data Abstraction
72.2.1 Example: Arithmetic on Rational Numbers
82.2.2 Pairs
92.2.3 Abstraction Barriers
102.2.4 The Properties of Data
112.3 Sequences
122.3.1 Nested Pairs
132.3.2 Recursive Lists
142.3.3 Tuples
152.3.4 Sequence Iteration
162.3.5 Sequence Abstraction
172.3.6 Strings
182.3.7 Conventional Interfaces
192.4 Mutable Data
202.4.1 Lists
212.4.2 Dictionaries
222.4.3 Local State
232.4.4 The Benefits of Non-Local Assignment
242.4.5 The Cost of Non-Local Assignment
252.4.6 Implementing Lists and Dictionaries
262.4.7 Dispatch Dictionaries
272.4.8 Example: Propagating Constraints
282.5 Object-Oriented Programming
292.5.1 Objects and Classes
302.5.2 Defining Classes
312.5.3 Message Passing and Dot Expressions
322.5.4 Class Attributes
332.5.5 Inheritance
342.5.6 Using Inheritance
352.5.7 Multiple Inheritance
362.5.8 The Role of Objects
372.6 Implementing Classes and Objects
382.6.1 Instances
392.6.2 Classes
402.6.3 Using Implemented Objects
412.7 Generic Operations
422.7.1 String Conversion
432.7.2 Multiple Representations
442.7.3 Generic Functions
452.1 Introduction
46
47We concentrated in Chapter 1 on computational processes and on the role of functions in program design. We saw how to use primitive data (numbers) and primitive operations (arithmetic operations), how to form compound functions through composition and control, and how to create functional abstractions by giving names to processes. We also saw that higher-order functions enhance the power of our language by enabling us to manipulate, and thereby to reason, in terms of general methods of computation. This is much of the essence of programming.
48
49This chapter focuses on data. Data allow us to represent and manipulate information about the world using the computational tools we have acquired so far. Programs without data structures may suffice for exploring mathematical properties. But real-world phenomena, such as documents, relationships, cities, and weather patterns, all have complex structure that is best represented using compound data types. With structured data, programs can simulate and reason about virtually any domain of human knowledge and experience. Due to the explosive growth of the Internet, a vast amount of structured information about the world is freely available to all of us online.
50
512.1.1 The Object Metaphor
52
53In the beginning of this text, we distinguished between functions and data: functions performed operations and data were operated upon. When we included function values among our data, we acknowledged that data too can have behavior. Functions could be operated upon like data, but could also be called to perform computation.
54
55In this text, objects will serve as our central programming metaphor for data values that also have behavior. Objects represent information, but also behave like the abstract concepts that they represent. The logic of how an object interacts with other objects is bundled along with the information that encodes the object's value. When an object is printed, it knows how to spell itself out as letters and numerals. If an object is composed of parts, it knows how to reveal those parts on demand. Objects are both information and processes, bundled together to represent the properties, interactions, and behaviors of complex things.
56
57The object metaphor is implemented in Python through specialized object syntax and associated terminology, which we can introduce by example. A date is a kind of simple object.
58
59>>> from datetime import date
60The name date is bound to a class. A class represents a kind of object. Individual dates are called instances of that class, and they can be constructed by calling the class as a function on arguments that characterize the instance.
61
62>>> today = date(2012, 9, 10)
63While today was constructed from primitive numbers, it behaves like a date. For instance, subtracting it from another date will give a time difference, which we can display as a line of text by calling str.
64
65>>> str(date(2012, 11, 30) - today)
66'81 days, 0:00:00'
67Objects have attributes, which are named values that are part of the object. In Python, like many other programming languages, we use dot notation to designated an attribute of an object.
68
69<expression> . <name>
70Above, the <expression> evaluates to an object, and <name> is the name of an attribute for that object.
71
72Unlike the names that we have considered so far, these attribute names are not available in the general environment. Instead, attribute names are particular to the object instance preceding the dot.
73
74>>> today.year
752012
76Objects also have methods, which are function-valued attributes. Metaphorically, we say that the object "knows" how to carry out those methods. By implementation, methods are functions that compute their results from both their arguments and their object. For example, The strftime method (a classic function name meant to evoke "string format of time") of today takes a single argument that specifies how to display a date (e.g., %A means that the day of the week should be spelled out in full).
77
78>>> today.strftime('%A, %B %d')
79'Monday, September 10'
80Computing the return value of strftime requires two inputs: the string that describes the format of the output and the date information bundled into today. Date-specific logic is applied within this method to yield this result. We never stated that the 10th of September, 2012, was a Monday, but knowing one's weekday is part of what it means to be a date. By bundling behavior and information together, this Python object offers us a convincing, self-contained abstraction of a date.
81
82Dot notation provides another form of combined expression in Python. Dot notation also has a well-defined evaluation procedure. However, developing a precise account of how dot notation is evaluated will have to wait until we introduce the full paradigm of object-oriented programming over the next several sections.
83
842.1.2 Native Data Types
85
86Every object in Python has a type. The type function allows us to inspect the type of an object.
87
88>>> type(today)
89<class 'datetime.date'>
90So far, the types of objects we have used extensively are relatively few: numbers, functions, Booleans, and now dates. We also briefly encountered sets and strings in Chapter 1, but we will need to study those in more depth. There are many other kinds of objects --- sounds, images, locations, data connections, etc. --- most of which can be defined by the means of combination and abstraction that we develop in this chapter. Python has only a handful of primitive or native data types built into the language.
91
92Native data types have the following properties:
93
94There are primitive expressions that evaluate to objects of these types, called literals.
95There are built-in functions, operators, and methods to manipulate these types of values.
96As we have seen, numbers are native; numeric literals evaluate to numbers, and mathematical operators manipulate number objects.
97
98>>> 12 + 3000000000000000000000000
993000000000000000000000012
100In fact, Python includes three native numeric types: integers (int), real numbers (float), and complex numbers (complex).
101
102>>> type(2)
103<class 'int'>
104>>> type(1.5)
105<class 'float'>
106>>> type(1+1j)
107<class 'complex'>
108The name float comes from the way in which real numbers are represented in Python: a "floating point" representation. While the details of how numbers are represented is not a topic for this text, some high-level differences between int and float objects are important to know. In particular, int objects can only represent integers, but they represent them exactly, without any approximation. On the other hand, float objects can represent a wide range of fractional numbers, but not all rational numbers are representable. Nonetheless, float objects are often used to represent real and rational numbers approximately, up to some number of significant figures.
109
110Further reading. The following sections introduce more of Python's native data types, focusing on the role they play in creating useful data abstractions. A chapter on native data types in the online book Dive Into Python 3 gives a pragmatic overview of all Python's native data types and how to use them effectively, including numerous usage examples and practical tips.
111
1122.2 Data Abstraction
113
114As we consider the wide set of things in the world that we would like to represent in our programs, we find that most of them have compound structure. A date has a year, a month, and a day; a geographic position has latitude and longitude coordinates. To represent positions, we would like our programming language to have the capacity to couple together a latitude and longitude to form a pair, a compound data value that our programs can manipulate as a single conceptual unit, but which also has two parts that can be considered individually.
115
116The use of compound data enables us to increase the modularity of our programs. If we can manipulate geographic positions directly as objects in their own right, then we can separate the part of our program that deals with values per se from the details of how those values may be represented. The general technique of isolating the parts of a program that deal with how data are represented from the parts of a program that deal with how those data are manipulated is a powerful design methodology called data abstraction. Data abstraction makes programs much easier to design, maintain, and modify.
117
118Data abstraction is similar in character to functional abstraction. When we create a functional abstraction, the details of how a function is implemented can be suppressed, and the particular function itself can be replaced by any other function with the same overall behavior. In other words, we can make an abstraction that separates the way the function is used from the details of how the function is implemented. Analogously, data abstraction is a methodology that enables us to isolate how a compound data object is used from the details of how it is constructed.
119
120The basic idea of data abstraction is to structure programs so that they operate on abstract data. That is, our programs should use data in such a way as to make as few assumptions about the data as possible. At the same time, a concrete data representation is defined, independently of the programs that use the data. The interface between these two parts of our system will be a set of functions, called selectors and constructors, that implement the abstract data in terms of the concrete representation. To illustrate this technique, we will consider how to design a set of functions for manipulating rational numbers.
121
122As you read the next few sections, keep in mind that most Python code written today uses very high-level abstract data types that are built into the language, like classes, dictionaries, and lists. Since we're building up an understanding of how these abstractions work, we can't use them yet ourselves. As a consequence, we will write some code that isn't typical of the typical way most Python programmers would implement these ideas in the language. What we write is instructive, however, because it demonstrates how these abstractions can be constructed! Remember that computer science isn't just about learning to use programming languages, but also understanding how they work.
123
1242.2.1 Example: Arithmetic on Rational Numbers
125
126Recall that a rational number is a ratio of integers, and rational numbers constitute an important sub-class of real numbers. A rational number like 1/3 or 17/29 is typically written as:
127
128<numerator>/<denominator>
129where both the <numerator> and <denominator> are placeholders for integer values. Both parts are needed to exactly characterize the value of the rational number.
130
131Rational numbers are important in computer science because they, like integers, can be represented exactly. Irrational numbers (like pi or e or sqrt(2)) are instead approximated using a finite binary expansion. Thus, working with rational numbers should, in principle, allow us to avoid approximation errors in our arithmetic.
132
133However, as soon as we actually divide the numerator by the denominator, we can be left with a truncated decimal approximation (a float).
134
135>>> 1/3
1360.3333333333333333
137and the problems with this approximation appear when we start to conduct tests:
138
139>>> 1/3 == 0.333333333333333300000 # Beware of approximations
140True
141How computers approximate real numbers with finite-length decimal expansions is a topic for another class. The important idea here is that by representing rational numbers as ratios of integers, we avoid the approximation problem entirely. Hence, we would like to keep the numerator and denominator separate for the sake of precision, but treat them as a single unit.
142
143We know from using functional abstractions that we can start programming productively before we have an implementation of some parts of our program. Let us begin by assuming that we already have a way of constructing a rational number from a numerator and a denominator. We also assume that, given a rational number, we have a way of extracting (or selecting) its numerator and its denominator. Let us further assume that the constructor and selectors are available as the following three functions:
144
145rational(n, d) returns the rational number with numerator n and denominator d.
146numer(x) returns the numerator of the rational number x.
147denom(x) returns the denominator of the rational number x.
148We are using here a powerful strategy of synthesis: wishful thinking. We haven't yet said how a rational number is represented, or how the functions numer, denom, and rational should be implemented. Even so, if we did have these three functions, we could then add, multiply, and test equality of rational numbers by calling them:
149
150>>> def add_rationals(x, y):
151 nx, dx = numer(x), denom(x)
152 ny, dy = numer(y), denom(y)
153 return rational(nx * dy + ny * dx, dx * dy)
154>>> def mul_rationals(x, y):
155 return rational(numer(x) * numer(y), denom(x) * denom(y))
156>>> def eq_rationals(x, y):
157 return numer(x) * denom(y) == numer(y) * denom(x)
158Now we have the operations on rational numbers defined in terms of the selector functions numer and denom, and the constructor function rational, but we haven't yet defined these functions. What we need is some way to glue together a numerator and a denominator into a unit.
159
1602.2.2 Pairs
161
162To enable us to implement the concrete level of our data abstraction, Python provides a compound structure called a tuple, which can be constructed by separating values by commas. Although not strictly required, parentheses almost always surround tuples.
163
164>>> (1, 2)
165(1, 2)
166The elements of a tuple can be unpacked in two ways. The first way is via our familiar method of multiple assignment.
167
168>>> pair = (1, 2)
169>>> pair
170(1, 2)
171>>> x, y = pair
172>>> x
1731
174>>> y
1752
176In fact, multiple assignment has been creating and unpacking tuples all along.
177
178A second method for accessing the elements in a tuple is by the indexing operator, expressed using square brackets.
179
180>>> pair[0]
1811
182>>> pair[1]
1832
184Tuples in Python (and sequences in most other programming languages) are 0-indexed, meaning that the index 0 picks out the first element, index 1 picks out the second, and so on. One intuition that underlies this indexing convention is that the index represents how far an element is offset from the beginning of the tuple.
185
186The equivalent function for the element selection operator is called getitem, and it also uses 0-indexed positions to select elements from a tuple.
187
188>>> from operator import getitem
189>>> getitem(pair, 0)
1901
191Tuples are native types, which means that there are built-in Python operators to manipulate them. We'll return to the full properties of tuples shortly. At present, we are only interested in how tuples can serve as the glue that implements abstract data types.
192
193Representing Rational Numbers. Tuples offer a natural way to implement rational numbers as a pair of two integers: a numerator and a denominator. We can implement our constructor and selector functions for rational numbers by manipulating 2-element tuples.
194
195>>> def rational(n, d):
196 return (n, d)
197>>> def numer(x):
198 return getitem(x, 0)
199>>> def denom(x):
200 return getitem(x, 1)
201A function for printing rational numbers completes our implementation of this abstract data type.
202
203>>> def rational_to_string(x):
204 """Return a string 'n/d' for numerator n and denominator d."""
205 return '{0}/{1}'.format(numer(x), denom(x))
206Together with the arithmetic operations we defined earlier, we can manipulate rational numbers with the functions we have defined.
207
208>>> half = rational(1, 2)
209>>> rational_to_string(half)
210'1/2'
211>>> third = rational(1, 3)
212>>> rational_to_string(mul_rationals(half, third))
213'1/6'
214>>> rational_to_string(add_rationals(third, third))
215'6/9'
216As the final example shows, our rational number implementation does not reduce rational numbers to lowest terms. We can remedy this by changing rational. If we have a function for computing the greatest common denominator of two integers, we can use it to reduce the numerator and the denominator to lowest terms before constructing the pair. As with many useful tools, such a function already exists in the Python Library.
217
218>>> from fractions import gcd
219>>> def rational(n, d):
220 g = gcd(n, d)
221 return (n//g, d//g)
222The double slash operator, //, expresses integer division, which rounds down the fractional part of the result of division. Since we know that g divides both n and d evenly, integer division is exact in this case. Now we have
223
224>>> rational_to_string(add_rationals(third, third))
225'2/3'
226as desired. This modification was accomplished by changing the constructor without changing any of the functions that implement the actual arithmetic operations.
227
228Further reading. The rational_to_string implementation above uses format strings, which contain placeholders for values. The details of how to use format strings and the format method appear in the formatting strings section of Dive Into Python 3.
229
2302.2.3 Abstraction Barriers
231
232Before continuing with more examples of compound data and data abstraction, let us consider some of the issues raised by the rational number example. We defined operations in terms of a constructor rational and selectors numer and denom. In general, the underlying idea of data abstraction is to identify for each type of value a basic set of operations in terms of which all manipulations of values of that type will be expressed, and then to use only those operations in manipulating the data.
233
234We can envision the structure of the rational number system as a series of layers.
235
236
237The horizontal lines represent abstraction barriers that isolate different levels of the system. At each level, the barrier separates the functions (above) that use the data abstraction from the functions (below) that implement the data abstraction. Programs that use rational numbers manipulate them solely in terms of the their arithmetic functions: add_rationals, mul_rationals, and eq_rationals. These, in turn, are implemented solely in terms of the constructor and selectors rational, numer, and denom, which themselves are implemented in terms of tuples. The details of how tuples are implemented are irrelevant to the rest of the layers as long as tuples enable the implementation of the selectors and constructor.
238
239At each layer, the functions within the box enforce the abstraction boundary because they are the only functions that depend upon both the representation above them (by their use) and the implementation below them (by their definitions). In this way, abstraction barriers are expressed as sets of functions.
240
241Abstraction barriers provide many advantages. One advantage is that they makes programs much easier to maintain and to modify. The fewer functions that depend on a particular representation, the fewer changes are required when one wants to change that representation.
242
2432.2.4 The Properties of Data
244
245We began the rational-number implementation by implementing arithmetic operations in terms of three unspecified functions: rational, numer, and denom. At that point, we could think of the operations as being defined in terms of data objects --- numerators, denominators, and rational numbers --- whose behavior was specified by the latter three functions.
246
247But what exactly is meant by data? It is not enough to say "whatever is implemented by the given selectors and constructors." We need to guarantee that these functions together specify the right behavior. That is, if we construct a rational number x from integers n and d, then it should be the case that numer(x)/denom(x) is equal to n/d.
248
249In general, we can think of an abstract data type as defined by some collection of selectors and constructors, together with some behavior conditions. As long as the behavior conditions are met (such as the division property above), these functions constitute a valid representation of the data type.
250
251This point of view can be applied to other data types as well, such as the two-element tuple that we used in order to implement rational numbers. We never actually said much about what a tuple was, only that the language supplied operators to create and manipulate tuples. We can now describe the behavior conditions of two-element tuples, also called pairs, that are relevant to the problem of representing rational numbers.
252
253In order to implement rational numbers, we needed a form of glue for two integers, which had the following behavior:
254
255If a pair p was constructed from values x and y, then getitem_pair(p, 0) returns x, and getitem_pair(p, 1) returns y.
256We can implement functions pair and getitem_pair that fulfill this description just as well as a tuple.
257
258>>> def pair(x, y):
259 """Return a function that behaves like a two-element tuple."""
260 def dispatch(m):
261 if m == 0:
262 return x
263 elif m == 1:
264 return y
265 return dispatch
266>>> def getitem_pair(p, i):
267 """Return the element at index i of pair p."""
268 return p(i)
269With this implementation, we can create and manipulate pairs.
270
271>>> p = pair(20, 12)
272>>> getitem_pair(p, 0)
27320
274>>> getitem_pair(p, 1)
27512
276This use of functions corresponds to nothing like our intuitive notion of what data should be. Nevertheless, these functions suffice to represent compound data in our programs.
277
278The subtle point to notice is that the value returned by pair is a function called dispatch, which takes an argument m and returns either x or y. Then, getitem_pair calls this function to retrieve the appropriate value. We will return to the topic of dispatch functions several times throughout this chapter.
279
280The point of exhibiting the functional representation of a pair is not that Python actually works this way (tuples are implemented more directly, for efficiency reasons) but that it could work this way. The functional representation, although obscure, is a perfectly adequate way to represent pairs, since it fulfills the only conditions that pairs need to fulfill. This example also demonstrates that the ability to manipulate functions as values automatically provides us the ability to represent compound data.
281
2822.3 Sequences
283
284A sequence is an ordered collection of data values. Unlike a pair, which has exactly two elements, a sequence can have an arbitrary (but finite) number of ordered elements.
285
286The sequence is a powerful, fundamental abstraction in computer science. For example, if we have sequences, we can list every university in the world, or every student in every university. The sequence abstraction enables the thousands of data-driven programs that impact our lives every day.
287
288A sequence is not a particular abstract data type, but instead a collection of behaviors that different types share. That is, there are many kinds of sequences, but they all share certain properties. In particular,
289
290Length. A sequence has a finite length.
291
292Element selection. A sequence has an element corresponding to any non-negative integer index less than its length, starting at 0 for the first element.
293
294Unlike an abstract data type, we have not stated how to construct a sequence. The sequence abstraction is a collection of behaviors that does not fully specify a type (i.e., with constructors and selectors), but may be shared among several types. Sequences provide a layer of abstraction that may hide the details of exactly which sequence type is being manipulated by a particular program.
295
296In this section, we develop a particular abstract data type that can implement the sequence abstraction. We then introduce built-in Python types that also implement the same abstraction.
297
2982.3.1 Nested Pairs
299
300For rational numbers, we paired together two integer objects using a two-element tuple, then showed that we could implement pairs just as well using functions. In that case, the elements of each pair we constructed were integers. However, like expressions, tuples can nest. Either element of a pair can itself be a pair, a property that holds true for either method of implementing a pair that we have seen: as a tuple or as a dispatch function.
301
302We visualize pairs (two-element tuples) in environment diagrams using box-and-pointer notation. Pairs are depicted as boxes with two parts: the left part contains (an arrow to) the first element of the pair and the right part contains the second. Simple values such as numbers, strings, boolean values, and None appear within the box. Composite values, such as function values and other pairs, are connected by a pointer.
303
3041 numbers = (1, 2)
3052 pairs = ((1, 2), (3, 4))
306Edit code
307 < Back Program terminated Forward >
308Global frame
309numbers
310
311pairs
312
313tuple
3140 1
3151 2
316tuple
3170 1
318
319
320tuple
3210 1
3221 2
323tuple
3240 1
3253 4
326Our ability to use tuples as the elements of other tuples provides a new means of combination in our programming language. We call the ability for tuples to nest in this way a closure property of the tuple data type. In general, a method for combining data values satisfies the closure property if the result of combination can itself be combined using the same method. Closure is the key to power in any means of combination because it permits us to create hierarchical structures --- structures made up of parts, which themselves are made up of parts, and so on. We will explore a range of hierarchical structures in Chapter 3. For now, we consider a particularly important structure.
327
3282.3.2 Recursive Lists
329
330We can use nested pairs to form lists of elements of arbitrary length, which will allow us to implement the sequence abstraction. The environment diagram below illustrates the structure of the recursive representation of a four-element list: 1, 2, 3, 4.
331
3321 up_to_four = (1, (2, (3, (4, None))))
333Edit code
334 < Back Program terminated Forward >
335Global frame
336up_to_four
337
338tuple
3390 1
3401
341
342tuple
3430 1
3442
345
346tuple
3470 1
3483
349
350tuple
3510 1
3524 None
353The list is represented by a chain of pairs. The first element of each pair is an element in the list, while the second is a pair that represents the rest of the list. The second element of the final pair is None, which indicates that the list has ended. This structure can be constructed using the nested tuple literal above.
354
355This nested structure corresponds to a very useful way of thinking about sequences in general. A non-empty sequence can be decomposed into:
356
357its first element, and
358the rest of the sequence.
359The rest of a sequence is itself a (possibly empty) sequence. We call this view of sequences recursive, because sequences contain other sequences as their second component.
360
361Since our list representation is recursive, we will call it an rlist in our implementation, so as not to confuse it with the built-in list type in Python that we will discuss later. A recursive list can be constructed from a first element and the rest of the list. The value None represents an empty recursive list.
362
363>>> empty_rlist = None
364>>> def rlist(first, rest):
365 """Construct a recursive list from its first element and the rest."""
366 return (first, rest)
367>>> def first(s):
368 """Return the first element of a recursive list s."""
369 return s[0]
370>>> def rest(s):
371 """Return the rest of the elements of a recursive list s."""
372 return s[1]
373These two selectors, one constructor, and one constant together implement the recursive list abstract data type. The single behavior condition for a recursive list is that, like a pair, its constructor and selectors are inverse functions.
374
375If a recursive list s was constructed from element f and list r, then first(s) returns f, and rest(s) returns r.
376We can use the constructor and selectors to manipulate recursive lists.
377
378>>> counts = rlist(1, rlist(2, rlist(3, rlist(4, empty_rlist))))
379>>> first(counts)
3801
381>>> rest(counts)
382(2, (3, (4, None)))
383Recall that we were able to represent pairs using functions, and therefore we can represent recursive lists using functions as well.
384
385The recursive list can store a sequence of values in order, but it does not yet implement the sequence abstraction. Using the abstract data type we have defined, we can implement the two behaviors that characterize a sequence: length and element selection.
386
387>>> def len_rlist(s):
388 """Return the length of recursive list s."""
389 length = 0
390 while s != empty_rlist:
391 s, length = rest(s), length + 1
392 return length
393>>> def getitem_rlist(s, i):
394 """Return the element at index i of recursive list s."""
395 while i > 0:
396 s, i = rest(s), i - 1
397 return first(s)
398Now, we can manipulate a recursive list as a sequence:
399
400>>> len_rlist(counts)
4014
402>>> getitem_rlist(counts, 1) # The second item has index 1
4032
404Both of these implementations are iterative. They peel away each layer of nested pair until the end of the list (in len_rlist) or the desired element (in getitem_rlist) is reached.
405
406The series of environment diagrams below illustrate the iterative process by which getitem_rlist finds the element 2 at index 1 in the recursive list. Below, we have defined the rlist counts using Python primitives to simplify the diagrams. This implementation choice violate the abstraction barrier for the rlist data type, but allows us to inspect the computational process more easily for this example.
407
4081 def first(s):
4092 return s[0]
4103 def rest(s):
4114 return s[1]
4125
4136 def getitem_rlist(s, i):
4147 while i > 0:
4158 s, i = rest(s), i - 1
4169 return first(s)
41710
41811 counts = (1, (2, (3, (4, None))))
41912 getitem_rlist(counts, 1)
420Edit code
421 < Back Step 5 of 14 Forward >
422Global frame
423first
424
425rest
426
427getitem_rlist
428
429counts
430
431func first(s)
432func rest(s)
433func getitem_rlist(s, i)
434tuple
4350 1
4361
437
438tuple
4390 1
4402
441
442tuple
4430 1
4443
445
446tuple
4470 1
4484 None
449First, the function getitem_rlist is called, creating a local frame.
450
4511 def first(s):
4522 return s[0]
4533 def rest(s):
4544 return s[1]
4555
4566 def getitem_rlist(s, i):
4577 while i > 0:
4588 s, i = rest(s), i - 1
4599 return first(s)
46010
46111 counts = (1, (2, (3, (4, None))))
46212 getitem_rlist(counts, 1)
463Edit code
464 < Back Step 6 of 14 Forward >
465Global frame
466first
467
468rest
469
470getitem_rlist
471
472counts
473
474getitem_rlist
475s
476
477i 1
478func first(s)
479func rest(s)
480func getitem_rlist(s, i)
481tuple
4820 1
4831
484
485tuple
4860 1
4872
488
489tuple
4900 1
4913
492
493tuple
4940 1
4954 None
496The expression in the while header evaluates to true, which causes the assignment statement in the while suite to be executed. The function rest returns the sublist starting with 2.
497
4981 def first(s):
4992 return s[0]
5003 def rest(s):
5014 return s[1]
5025
5036 def getitem_rlist(s, i):
5047 while i > 0:
5058 s, i = rest(s), i - 1
5069 return first(s)
50710
50811 counts = (1, (2, (3, (4, None))))
50912 getitem_rlist(counts, 1)
510Edit code
511 < Back Step 9 of 14 Forward >
512Global frame
513first
514
515rest
516
517getitem_rlist
518
519counts
520
521getitem_rlist
522s
523
524i 1
525rest
526s
527
528Return
529value
530
531func first(s)
532func rest(s)
533func getitem_rlist(s, i)
534tuple
5350 1
5361
537
538tuple
5390 1
5402
541
542tuple
5430 1
5443
545
546tuple
5470 1
5484 None
549Next, the local name s will be updated to refer to the sub-list that begins with the second element of the original list. Evaluating the while header expression now yields a false value, and so Python evaluates the expression in the return statement on the final line of getitem_rlist.
550
5511 def first(s):
5522 return s[0]
5533 def rest(s):
5544 return s[1]
5555
5566 def getitem_rlist(s, i):
5577 while i > 0:
5588 s, i = rest(s), i - 1
5599 return first(s)
56010
56111 counts = (1, (2, (3, (4, None))))
56212 getitem_rlist(counts, 1)
563Edit code
564 < Back Step 13 of 14 Forward >
565Global frame
566first
567
568rest
569
570getitem_rlist
571
572counts
573
574getitem_rlist
575s
576
577i 0
578rest
579s
580
581Return
582value
583
584first
585s
586
587Return
588value 2
589func first(s)
590func rest(s)
591func getitem_rlist(s, i)
592tuple
5930 1
5941
595
596tuple
5970 1
5982
599
600tuple
6010 1
6023
603
604tuple
6050 1
6064 None
607This final environment diagram shows the local frame for the call to first, which contains the name s bound to that same sub-list. The first function selects the value 2 and returns it, which will also be returned from getitem_rlist.
608
609This example demonstrates a common pattern of computation with recursive lists, where each step in an iteration operates on an increasingly shorter suffix of the original list. This incremental processing to find the length and elements of a recursive list does take some time to compute. (In Chapter 3, we will learn to characterize the computation time of iterative functions like these.) Python's built-in sequence types are implemented in a different way that does not have a large computational cost for computing the length of a sequence or retrieving its elements.
610
611The way in which we construct recursive lists is rather verbose. Fortunately, Python provides a variety of built-in sequence types that provide both the versatility of the sequence abstraction, as well as convenient notation.
612
6132.3.3 Tuples
614
615In fact, the tuple type that we introduced to form primitive pairs is itself a full sequence type. Tuples provide substantially more functionality than the pair abstract data type that we implemented functionally.
616
617Tuples can have arbitrary length, and they exhibit the two principal behaviors of the sequence abstraction: length and element selection. Below, digits is a tuple with four elements.
618
619>>> digits = (1, 8, 2, 8)
620>>> len(digits)
6214
622>>> digits[3]
6238
624Additionally, tuples can be added together and multiplied by integers. For tuples, addition and multiplication do not add or multiply elements, but instead combine and replicate the tuples themselves. That is, the add function in the operator module (and the + operator) returns a new tuple that is the conjunction of the added arguments. The mul function in operator (and the * operator) can take an integer k and a tuple and return a new tuple that consists of k copies of the tuple argument.
625
626>>> (2, 7) + digits * 2
627(2, 7, 1, 8, 2, 8, 1, 8, 2, 8)
628Mapping. A powerful method of transforming one tuple into another is by applying a function to each element and collecting the results. This general form of computation is called mapping a function over a sequence, and corresponds to the built-in function map. The result of map is an object that is not itself a sequence, but can be converted into a sequence by calling tuple, the constructor function for tuples.
629
630>>> alternates = (-1, 2, -3, 4, -5)
631>>> tuple(map(abs, alternates))
632(1, 2, 3, 4, 5)
633The map function is important because it relies on the sequence abstraction: we do not need to be concerned about the structure of the underlying tuple; only that we can access each one of its elements individually in order to pass it as an argument to the mapped function (abs, in this case).
634
6352.3.4 Sequence Iteration
636
637Mapping is itself an instance of a general pattern of computation: iterating over all elements in a sequence. To map a function over a sequence, we do not just select a particular element, but each element in turn. This pattern is so common that Python has an additional control statement to process sequential data: the for statement.
638
639Consider the problem of counting how many times a value appears in a sequence. We can implement a function to compute this count using a while loop.
640
641>>> def count(s, value):
642 """Count the number of occurrences of value in sequence s."""
643 total, index = 0, 0
644 while index < len(s):
645 if s[index] == value:
646 total = total + 1
647 index = index + 1
648 return total
649>>> count(digits, 8)
6502
651The Python for statement can simplify this function body by iterating over the element values directly, without introducing the name index at all. For example (pun intended), we can write:
652
653>>> def count(s, value):
654 """Count the number of occurrences of value in sequence s."""
655 total = 0
656 for elem in s:
657 if elem == value:
658 total = total + 1
659 return total
660>>> count(digits, 8)
6612
662A for statement consists of a single clause with the form:
663
664for <name> in <expression>:
665 <suite>
666A for statement is executed by the following procedure:
667
668Evaluate the header <expression>, which must yield an iterable value.
669For each element value in that sequence, in order:
670Bind <name> to that value in the local environment.
671Execute the <suite>.
672Step 1 refers to an iterable value. Sequences are iterable, and their elements are considered in their sequential order. Python does include other iterable types, but we will focus on sequences for now; the general definition of the term "iterable" appears in the section on iterators in Chapter 4.
673
674An important consequence of this evaluation procedure is that <name> will be bound to the last element of the sequence after the for statement is executed. The for loop introduces yet another way in which the local environment can be updated by a statement.
675
676Sequence unpacking. A common pattern in programs is to have a sequence of elements that are themselves sequences, but all of a fixed length. For statements may include multiple names in their header to "unpack" each element sequence into its respective elements. For example, we may have a sequence of pairs (that is, two-element tuples),
677
678>>> pairs = ((1, 2), (2, 2), (2, 3), (4, 4))
679and wish to find the number of pairs that have the same first and second element.
680
681>>> same_count = 0
682The following for statement with two names in its header will bind each name x and y to the first and second elements in each pair, respectively.
683
684>>> for x, y in pairs:
685 if x == y:
686 same_count = same_count + 1
687>>> same_count
6882
689This pattern of binding multiple names to multiple values in a fixed-length sequence is called sequence unpacking; it is the same pattern that we see in assignment statements that bind multiple names to multiple values.
690
691Ranges. A range is another built-in type of sequence in Python, which represents a range of integers. Ranges are created with the range function, which takes two integer arguments: the first number and one beyond the last number in the desired range.
692
693>>> range(1, 10) # Includes 1, but not 10
694range(1, 10)
695Calling the tuple constructor on a range will create a tuple with the same elements as the range, so that the elements can be easily inspected.
696
697>>> tuple(range(5, 8))
698(5, 6, 7)
699If only one argument is given, it is interpreted as one beyond the last value for a range that starts at 0.
700
701>>> tuple(range(4))
702(0, 1, 2, 3)
703Ranges commonly appear as the expression in a for header to specify the number of times that the suite should be executed:
704
705>>> total = 0
706>>> for k in range(5, 8):
707 total = total + k
708>>> total
70918
710A common convention is to use a single underscore character for the name in the for header if the name is unused in the suite:
711
712>>> for _ in range(3):
713 print('Go Bears!')
714
715Go Bears!
716Go Bears!
717Go Bears!
718Note that an underscore is just another name in the environment as far as the interpreter is concerned, but has a conventional meaning among programmers that indicates the name will not appear in any expressions.
719
7202.3.5 Sequence Abstraction
721
722We have now introduced two types of native data types that implement the sequence abstraction: tuples and ranges. Both satisfy the conditions with which we began this section: length and element selection. Python includes two more behaviors of sequence types that extend the sequence abstraction.
723
724Membership. A value can be tested for membership in a sequence. Python has two operators in and not in that evaluate to True or False depending on whether an element appears in a sequence.
725
726>>> digits
727(1, 8, 2, 8)
728>>> 2 in digits
729True
730>>> 1828 not in digits
731True
732All sequences also have methods called index and count, which return the index of (or count of) a value in a sequence.
733
734Slicing. Sequences contain smaller sequences within them. We observed this property when developing our nested pairs implementation, which decomposed a sequence into its first element and the rest. A slice of a sequence is any span of the original sequence, designated by a pair of integers. As with the range constructor, the first integer indicates the starting index of the slice and the second indicates one beyond the ending index.
735
736In Python, sequence slicing is expressed similarly to element selection, using square brackets. A colon separates the starting and ending indices. Any bound that is omitted is assumed to be an extreme value: 0 for the starting index, and the length of the sequence for the ending index.
737
738>>> digits[0:2]
739(1, 8)
740>>> digits[1:]
741(8, 2, 8)
742Enumerating these additional behaviors of the Python sequence abstraction gives us an opportunity to reflect upon what constitutes a useful data abstraction in general. The richness of an abstraction (that is, how many behaviors it includes) has consequences. For users of an abstraction, additional behaviors can be helpful. On the other hand, satisfying the requirements of a rich abstraction with a new data type can be challenging. To ensure that our implementation of recursive lists supported these additional behaviors would require some work. Another negative consequence of rich abstractions is that they take longer for users to learn.
743
744Sequences have a rich abstraction because they are so ubiquitous in computing that learning a few complex behaviors is justified. In general, most user-defined abstractions should be kept as simple as possible.
745
746Further reading. Slice notation admits a variety of special cases, such as negative starting values, ending values, and step sizes. A complete description appears in the subsection called slicing a list in Dive Into Python 3. In this chapter, we will only use the basic features described above.
747
7482.3.6 Strings
749
750Text values are perhaps more fundamental to computer science than even numbers. As a case in point, Python programs are written and stored as text. The native data type for text in Python is called a string, and corresponds to the constructor str.
751
752There are many details of how strings are represented, expressed, and manipulated in Python. Strings are another example of a rich abstraction, one which requires a substantial commitment on the part of the programmer to master. This section serves as a condensed introduction to essential string behaviors.
753
754String literals can express arbitrary text, surrounded by either single or double quotation marks.
755
756>>> 'I am string!'
757'I am string!'
758>>> "I've got an apostrophe"
759"I've got an apostrophe"
760>>> '您好'
761'您好'
762We have seen strings already in our code, as docstrings, in calls to print, and as error messages in assert statements.
763
764Strings satisfy the two basic conditions of a sequence that we introduced at the beginning of this section: they have a length and they support element selection.
765
766>>> city = 'Berkeley'
767>>> len(city)
7688
769>>> city[3]
770'k'
771The elements of a string are themselves strings that have only a single character. A character is any single letter of the alphabet, punctuation mark, or other symbol. Unlike many other programming languages, Python does not have a separate character type; any text is a string, and strings that represent single characters have a length of 1.
772
773Like tuples, strings can also be combined via addition and multiplication.
774
775>>> 'Berkeley' + ', CA'
776'Berkeley, CA'
777>>> 'Shabu ' * 2
778'Shabu Shabu '
779Membership. The behavior of strings diverges from other sequence types in Python. The string abstraction does not conform to the full sequence abstraction that we described for tuples and ranges. In particular, the membership operator in applies to strings, but has an entirely different behavior than when it is applied to sequences. It matches substrings rather than elements.
780
781>>> 'here' in "Where's Waldo?"
782True
783Likewise, the count and index methods on strings take substrings as arguments, rather than single-character elements. The behavior of count is particularly nuanced; it counts the number of non-overlapping occurrences of a substring in a string.
784
785>>> 'Mississippi'.count('i')
7864
787>>> 'Mississippi'.count('issi')
7881
789Multiline Literals. Strings aren't limited to a single line. Triple quotes delimit string literals that span multiple lines. We have used this triple quoting extensively already for docstrings.
790
791>>> """The Zen of Python
792claims, Readability counts.
793Read more: import this."""
794'The Zen of Python\nclaims, "Readability counts."\nRead more: import this.'
795In the printed result above, the \n (pronounced "backslash en") is a single element that represents a new line. Although it appears as two characters (backslash and "n"), it is considered a single character for the purposes of length and element selection.
796
797String Coercion. A string can be created from any object in Python by calling the str constructor function with an object value as its argument. This feature of strings is useful for constructing descriptive strings from objects of various types.
798
799>>> str(2) + ' is an element of ' + str(digits)
800'2 is an element of (1, 8, 2, 8)'
801The mechanism by which a single str function can apply to any type of argument and return an appropriate value is the subject of the later section on generic functions.
802
803Methods. The behavior of strings in Python is extremely productive because of a rich set of methods for returning string variants and searching for contents. A few of these methods are introduced below by example.
804
805>>> '1234'.isnumeric()
806True
807>>> 'rOBERT dE nIRO'.swapcase()
808'Robert De Niro'
809>>> 'snakeyes'.upper().endswith('YES')
810True
811Further reading. Encoding text in computers is a complex topic. In this chapter, we will abstract away the details of how strings are represented. However, for many applications, the particular details of how strings are encoded by computers is essential knowledge. Sections 4.1-4.3 of Dive Into Python 3 provides a description of character encodings and Unicode.
812
8132.3.7 Conventional Interfaces
814
815In working with compound data, we've stressed how data abstraction permits us to design programs without becoming enmeshed in the details of data representations, and how abstraction preserves for us the flexibility to experiment with alternative representations. In this section, we introduce another powerful design principle for working with data structures --- the use of conventional interfaces.
816
817A conventional interface is a data format that is shared across many modular components, which can be mixed and matched to perform data processing. For example, if we have several functions that all take a sequence as an argument and return a sequence as a value, then we can apply each to the output of the next in any order we choose. In this way, we can create a complex process by chaining together a pipeline of functions, each of which is simple and focused.
818
819This section has a dual purpose: to introduce the idea of organizing a program around a conventional interface, and to demonstrate examples of modular sequence processing.
820
821Consider these two problems, which appear at first to be related only in their use of sequences:
822
823Sum the even members of the first n Fibonacci numbers.
824List the letters in the acronym for a name, which includes the first letter of each capitalized word.
825These problems are related because they can be decomposed into simple operations that take sequences as input and yield sequences as output. Moreover, those operations are instances of general methods of computation over sequences. Let's consider the first problem. It can be decomposed into the following steps:
826
827 enumerate map filter accumulate
828----------- --- ------ ----------
829naturals(n) fib iseven sum
830The fib function below computes Fibonacci numbers (now updated from the definition in Chapter 1 with a for statement),
831
832>>> def fib(k):
833 """Compute the kth Fibonacci number."""
834 prev, curr = 1, 0 # curr is the first Fibonacci number.
835 for _ in range(k - 1):
836 prev, curr = curr, prev + curr
837 return curr
838and a predicate iseven can be defined using the integer remainder operator, %.
839
840>>> def iseven(n):
841 return n % 2 == 0
842The functions map and filter are operations on sequences. We have already encountered map, which applies a function to each element in a sequence and collects the results. The filter function takes a sequence and returns those elements of a sequence for which a predicate is true. Both of these functions return intermediate objects, map and filter objects, which are iterable objects that can be converted into tuples or summed.
843
844>>> nums = (5, 6, -7, -8, 9)
845>>> tuple(filter(iseven, nums))
846(6, -8)
847>>> sum(map(abs, nums))
84835
849Now we can implement even_fib, the solution to our first problem, in terms of map, filter, and sum.
850
851>>> def sum_even_fibs(n):
852 """Sum the first n even Fibonacci numbers."""
853 return sum(filter(iseven, map(fib, range(1, n+1))))
854>>> sum_even_fibs(20)
8553382
856Now, let's consider the second problem. It can also be decomposed as a pipeline of sequence operations that include map and filter:
857
858enumerate filter map accumulate
859--------- ------ ----- ----------
860 words iscap first tuple
861The words in a string can be enumerated via the split method of a string object, which by default splits on spaces.
862
863>>> tuple('Spaces between words'.split())
864('Spaces', 'between', 'words')
865The first letter of a word can be retrieved using the selection operator, and a predicate that determines if a word is capitalized can be defined using the built-in predicate isupper.
866
867>>> def first(s):
868 return s[0]
869>>> def iscap(s):
870 return len(s) > 0 and s[0].isupper()
871At this point, our acronym function can be defined via map and filter.
872
873>>> def acronym(name):
874 """Return a tuple of the letters that form the acronym for name."""
875 return tuple(map(first, filter(iscap, name.split())))
876>>> acronym('University of California Berkeley Undergraduate Graphics Group')
877('U', 'C', 'B', 'U', 'G', 'G')
878These similar solutions to rather different problems show how to combine general components that operate on the conventional interface of a sequence using the general computational patterns of mapping, filtering, and accumulation. The sequence abstraction allows us to specify these solutions concisely.
879
880Expressing programs as sequence operations helps us design programs that are modular. That is, our designs are constructed by combining relatively independent pieces, each of which transforms a sequence. In general, we can encourage modular design by providing a library of standard components together with a conventional interface for connecting the components in flexible ways.
881
882Generator expressions. The Python language includes a second approach to processing sequences, called generator expressions. which provide similar functionality to map and filter, but may require fewer function definitions.
883
884Generator expressions combine the ideas of filtering and mapping together into a single expression type with the following form:
885
886<map expression> for <name> in <sequence expression> if <filter expression>
887To evaluate a generator expression, Python evaluates the <sequence expression>, which must return an iterable value. Then, for each element in order, the element value is bound to <name>, the filter expression is evaluated, and if it yields a true value, the map expression is evaluated.
888
889The result value of evaluating a generator expression is itself an iterable value. Accumulation functions like tuple, sum, max, and min can take this returned object as an argument.
890
891>>> def acronym(name):
892 return tuple(w[0] for w in name.split() if iscap(w))
893>>> def sum_even_fibs(n):
894 return sum(fib(k) for k in range(1, n+1) if fib(k) % 2 == 0)
895Generator expressions are specialized syntax that utilizes the conventional interface of iterable values, such as sequences. These expressions subsume most of the functionality of map and filter, but avoid actually creating the function values that are applied (or, incidentally, creating the environment frames required to apply those functions).
896
897Reduce. In our examples we used specific functions to accumulate results, either tuple or sum. Functional programming languages (including Python) include general higher-order accumulators that go by various names. Python includes reduce in the functools module, which applies a two-argument function cumulatively to the elements of a sequence from left to right, to reduce a sequence to a value. The following expression computes 5 factorial.
898
899>>> from operator import mul
900>>> from functools import reduce
901>>> reduce(mul, (1, 2, 3, 4, 5))
902120
903Using this more general form of accumulation, we can also compute the product of even Fibonacci numbers, in addition to the sum, using sequences as a conventional interface.
904
905>>> def product_even_fibs(n):
906 """Return the product of the first n even Fibonacci numbers, except 0."""
907 return reduce(mul, filter(iseven, map(fib, range(2, n+1))))
908>>> product_even_fibs(20)
909123476336640
910The combination of higher order procedures corresponding to map, filter, and reduce will appear again in Chapter 4, when we consider methods for distributing computation across multiple computers.
911
9122.4 Mutable Data
913
914We have seen how abstraction is vital in helping us to cope with the complexity of large systems. Effective program synthesis also requires organizational principles that can guide us in formulating the overall design of a program. In particular, we need strategies to help us structure large systems so that they will be modular, that is, so that they can be divided "naturally" into coherent parts that can be separately developed and maintained.
915
916One powerful technique for creating modular programs is to introduce new kinds of data that may change state over time. In this way, a single data object can represent something that evolves independently of the rest of the program. The behavior of a changing object may be influenced by its history, just like an entity in the world. Adding state to data is an essential ingredient of our final destination in this chapter: object-oriented programming.
917
918The native data types we have introduced so far --- numbers, Booleans, tuples, ranges, and strings --- are all types of immutable objects. While names may change bindings to different values in the environment during the course of execution, the values themselves do not change. Mutable objects (also called mutable values) can change throughout the execution of a program.
919
9202.4.1 Lists
921
922The list is Python's most useful and flexible sequence type. A list is similar to a tuple, but it is mutable: Method calls and assignment statements can change the contents of a list.
923
924We can introduce many list modification operations through an example that illustrates the history of playing cards (drastically simplified). Comments in the examples describe the effect of each method invocation.
925
926Playing cards were invented in China, perhaps around the 9th century. An early deck had three suits, which corresponded to denominations of money.
927
928>>> chinese_suits = ['coin', 'string', 'myriad'] # A list literal
929>>> suits = chinese_suits # Two names refer to the same list
930As cards migrated to Europe (perhaps through Egypt), only the suit of coins remained in Spanish decks (oro).
931
932>>> suits.pop() # Remove and return the final element
933'myriad'
934>>> suits.remove('string') # Remove the first element that equals the argument
935Three more suits were added (they evolved in name and design over time),
936
937>>> suits.append('cup') # Add an element to the end
938>>> suits.extend(['sword', 'club']) # Add all elements of a list to the end
939and Italians called swords spades.
940
941>>> suits[2] = 'spade' # Replace an element
942giving the suits of a traditional Italian deck of cards.
943
944>>> suits
945['coin', 'cup', 'spade', 'club']
946The French variant that we use today in the U.S. changes the first two:
947
948>>> suits[0:2] = ['heart', 'diamond'] # Replace a slice
949>>> suits
950['heart', 'diamond', 'spade', 'club']
951Methods also exist for inserting, sorting, and reversing lists. All of these mutation operations change the value of the list; they do not create new list objects.
952
953Sharing and Identity. Because we have been changing a single list rather than creating new lists, the object bound to the name chinese_suits has also changed, because it is the same list object that was bound to suits!
954
955>>> chinese_suits # This name co-refers with "suits" to the same list
956['heart', 'diamond', 'spade', 'club']
957This behavior is new. Previously, if a name did not appear in a statement, then its value would not be affected by that statement. With mutable data, methods called on one name can affect another name at the same time.
958
959The environment diagram for this example shows how the value bound to chinese is changed by statements involving only suits. Step through each line of the following example to observe these changes.
960
9611 chinese = ['coin', 'string', 'myriad']
9622 suits = chinese
9633 suits.pop()
9644 suits.remove('string')
9655 suits.append('cup')
9666 suits.extend(['sword', 'club'])
9677 suits[2] = 'spade'
9688 suits[0:2] = ['heart', 'diamond']
969Edit code
970 < Back Program terminated Forward >
971Global frame
972chinese
973
974suits
975
976list
9770 1 2 3
978"heart" "diamond" "spade" "club"
979Lists can be copied using the list constructor function. Changes to one list do not affect another, unless they share structure.
980
981>>> nest = list(suits) # Bind "nest" to a second list with the same elements
982>>> nest[0] = suits # Create a nested list
983According to this environment, changing the list referenced by suits will affect the nested list that is the first element of nest, but not the other elements.
984
985>>> suits.insert(2, 'Joker') # Insert an element at index 2, shifting the rest
986>>> nest
987[['heart', 'diamond', 'Joker', 'spade', 'club'], 'diamond', 'spade', 'club']
988And likewise, undoing this change in the first element of nest will change suit as well.
989
990>>> nest[0].pop(2)
991'Joker'
992>>> suits
993['heart', 'diamond', 'spade', 'club']
994Stepping through this example line by line will show the representation of a nested list.
995
9961 suits = ['heart', 'diamond', 'spade', 'club']
9972 nest = list(suits)
9983 nest[0] = suits
9994 suits.insert(2, 'Joker')
10005 j = nest[0].pop(2)
1001Edit code
1002 < Back Step 3 of 5 Forward >
1003Global frame
1004suits
1005
1006nest
1007
1008list
10090 1 2 3
1010"heart" "diamond" "spade" "club"
1011list
10120 1 2 3
1013"heart" "diamond" "spade" "club"
1014Because two lists may have the same contents but in fact be different lists, we require a means to test whether two objects are the same. Python includes two comparison operators, called is and is not, that test whether two expressions in fact evaluate to the identical object. Two objects are identical if they are equal in their current value, and any change to one will always be reflected in the other. Identity is a stronger condition than equality.
1015
1016>>> suits is nest[0]
1017True
1018>>> suits is ['heart', 'diamond', 'spade', 'club']
1019False
1020>>> suits == ['heart', 'diamond', 'spade', 'club']
1021True
1022The final two comparisons illustrate the difference between is and ==. The former checks for identity, while the latter checks for the equality of contents.
1023
1024List comprehensions. A list comprehension uses an extended syntax for creating lists, analogous to the syntax of generator expressions.
1025
1026For example, the unicodedata module tracks the official names of every character in the Unicode alphabet. We can look up the characters corresponding to names, including those for card suits.
1027
1028>>> from unicodedata import lookup
1029>>> [lookup('WHITE ' + s.upper() + ' SUIT') for s in suits]
1030['♡', '♢', '♤', '♧']
1031List comprehensions reinforce the paradigm of data processing using the conventional interface of sequences, as list is a sequence data type.
1032
10332.4.2 Dictionaries
1034
1035Dictionaries are Python's built-in data type for storing and manipulating correspondence relationships. A dictionary contains key-value pairs, where both the keys and values are objects. The purpose of a dictionary is to provide an abstraction for storing and retrieving values that are indexed not by consecutive integers, but by descriptive keys.
1036
1037Strings commonly serve as keys, because strings are our conventional representation for names of things. This dictionary literal gives the values of various Roman numerals.
1038
1039>>> numerals = {'I': 1.0, 'V': 5, 'X': 10}
1040Looking up values by their keys uses the element selection operator that we previously applied to sequences.
1041
1042>>> numerals['X']
104310
1044A dictionary can have at most one value for each key. Adding new key-value pairs and changing the existing value for a key can both be achieved with assignment statements.
1045
1046>>> numerals['I'] = 1
1047>>> numerals['L'] = 50
1048>>> numerals
1049{'I': 1, 'X': 10, 'L': 50, 'V': 5}
1050Notice that 'L' was not added to the end of the output above. Dictionaries are unordered collections of key-value pairs. When we print a dictionary, the keys and values are rendered in some order, but as users of the language we cannot predict what that order will be.
1051
1052Dictionaries can appear in environment diagrams as well.
1053
10541 numerals = {'I': 1, 'V': 5, 'X': 10}
10552 numerals['L'] = 50
1056Edit code
1057 < Back Program terminated Forward >
1058Global frame
1059numerals
1060
1061dict
1062"I" 1
1063"X" 10
1064"L" 50
1065"V" 5
1066The dictionary abstraction also supports various methods of iterating of the contents of the dictionary as a whole. The methods keys, values, and items all return iterable values.
1067
1068>>> sum(numerals.values())
106966
1070A list of key-value pairs can be converted into a dictionary by calling the dict constructor function.
1071
1072>>> dict([(3, 9), (4, 16), (5, 25)])
1073{3: 9, 4: 16, 5: 25}
1074Dictionaries do have some restrictions:
1075
1076A key of a dictionary cannot be an object of a mutable built-in type.
1077There can be at most one value for a given key.
1078This first restriction is tied to the underlying implementation of dictionaries in Python. The details of this implementation are not a topic of this text. Intuitively, consider that the key tells Python where to find that key-value pair in memory; if the key changes, the location of the pair may be lost.
1079
1080The second restriction is a consequence of the dictionary abstraction, which is designed to store and retrieve values for keys. We can only retrieve the value for a key if at most one such value exists in the dictionary.
1081
1082A useful method implemented by dictionaries is get, which returns either the value for a key, if the key is present, or a default value. The arguments to get are the key and the default value.
1083
1084>>> numerals.get('A', 0)
10850
1086>>> numerals.get('V', 0)
10875
1088Dictionaries also have a comprehension syntax analogous to those of lists and generator expressions. Evaluating a dictionary comprehension yields a new dictionary object.
1089
1090>>> {x: x*x for x in range(3,6)}
1091{3: 9, 4: 16, 5: 25}
10922.4.3 Local State
1093
1094Lists and dictionaries have local state: they are changing values that have some particular contents at any point in the execution of a program. The word state implies an evolving process in which that state may change.
1095
1096Functions can also have local state. For instance, let us define a function that models the process of withdrawing money from a bank account. We will create a function called withdraw, which takes as its argument an amount to be withdrawn. If there is enough money in the account to accommodate the withdrawal, then withdraw will return the balance remaining after the withdrawal. Otherwise, withdraw will return the message 'Insufficient funds'. For example, if we begin with $100 in the account, we would like to obtain the following sequence of return values by calling withdraw:
1097
1098>>> withdraw(25)
109975
1100>>> withdraw(25)
110150
1102>>> withdraw(60)
1103'Insufficient funds'
1104>>> withdraw(15)
110535
1106Above, the expression withdraw(25), evaluated twice, yields different values. Thus, this user-defined function is non-pure. Calling the function not only returns a value, but also has the side effect of changing the function in some way, so that the next call with the same argument will return a different result. This side effect is a result of withdraw making a change to a name-value binding outside of its local environment frame.
1107
1108For withdraw to make sense, it must be created with an initial account balance. The function make_withdraw is a higher-order function that takes a starting balance as an argument. The function withdraw is its return value.
1109
1110>>> withdraw = make_withdraw(100)
1111An implementation of make_withdraw requires a new kind of statement: a nonlocal statement. When we call make_withdraw, we bind the name balance to the initial amount. We then define and return a local function, withdraw, which updates and returns the value of balance when called.
1112
1113>>> def make_withdraw(balance):
1114 """Return a withdraw function that draws down balance with each call."""
1115 def withdraw(amount):
1116 nonlocal balance # Declare the name "balance" nonlocal
1117 if amount > balance:
1118 return 'Insufficient funds'
1119 balance = balance - amount # Re-bind the existing balance name
1120 return balance
1121 return withdraw
1122The novel part of this implementation is the nonlocal statement, which mandates that whenever we change the binding of the name balance, the binding is changed in the first frame in which balance is already bound. Recall that without the nonlocal statement, an assignment statement would always bind a name in the first frame of the environment. The nonlocal statement indicates that the name appears somewhere in the environment other than the first (local) frame or the last (global) frame.
1123
1124The following environment diagrams illustrate the effects of multiple calls to a function created by make_withdraw.
1125
11261 def make_withdraw(balance):
11272 def withdraw(amount):
11283 nonlocal balance
11294 if amount > balance:
11305 return 'Insufficient funds'
11316 balance = balance - amount
11327 return balance
11338 return withdraw
11349
113510 wd = make_withdraw(20)
113611 wd(5)
113712 wd(3)
1138Edit code
1139 < Back Program terminated Forward >
1140Global frame
1141make_withdraw
1142
1143wd
1144
1145f1: make_withdraw
1146balance 12
1147withdraw
1148
1149Return
1150value
1151
1152withdraw [parent=f1]
1153amount 5
1154Return
1155value 15
1156withdraw [parent=f1]
1157amount 3
1158Return
1159value 12
1160func make_withdraw(balance)
1161func withdraw(amount) [parent=f1]
1162The first def statement has the usual effect: it creates a new user-defined function and binds the name make_withdraw to that function in the global frame. The subsequent call to make_withdraw creates and returns a locally defined function withdraw. The name balance is bound in the parent frame of this function. Crucially, there will only be this single binding for the name balance throughout the rest of this example.
1163
1164Next, we evaluate an expression that calls this function, bound to the name wd, on an amount 5. The body of withdraw is executed in a new environment that extends the environment in which withdraw was defined. Tracing the effect of evaluating withdraw illustrates the effect of a nonlocal statement in Python: a name outside of the first local frame can be changed by an assignment statement.
1165
11661 def make_withdraw(balance):
11672 def withdraw(amount):
11683 nonlocal balance
11694 if amount > balance:
11705 return 'Insufficient funds'
11716 balance = balance - amount
11727 return balance
11738 return withdraw
11749
117510 wd = make_withdraw(20)
117611 wd(5)
117712 wd(3)
1178Edit code
1179 < Back Step 11 of 15 Forward >
1180Global frame
1181make_withdraw
1182
1183wd
1184
1185f1: make_withdraw
1186balance 15
1187withdraw
1188
1189Return
1190value
1191
1192withdraw [parent=f1]
1193amount 5
1194Return
1195value 15
1196func make_withdraw(balance)
1197func withdraw(amount) [parent=f1]
1198The nonlocal statement changes all of the remaining assignment statements in the definition of withdraw. After executing nonlocal balance, any assignment statement with balance on the left-hand side of = will not bind balance in the first frame of the current environment. Instead, it will find the first frame in which balance was already defined and re-bind the name in that frame. If balance has not previously been bound to a value, then the nonlocal statement will give an error.
1199
1200By virtue of changing the binding for balance, we have changed the withdraw function as well. The next time it is called, the name balance will evaluate to 15 instead of 20. Hence, when we call withdraw a second time, we see that its return value is 12 and not 17. The change to balance from the first call affects the result of the second call.
1201
12021 def make_withdraw(balance):
12032 def withdraw(amount):
12043 nonlocal balance
12054 if amount > balance:
12065 return 'Insufficient funds'
12076 balance = balance - amount
12087 return balance
12098 return withdraw
12109
121110 wd = make_withdraw(20)
121211 wd(5)
121312 wd(3)
1214Edit code
1215 < Back Program terminated Forward >
1216Global frame
1217make_withdraw
1218
1219wd
1220
1221f1: make_withdraw
1222balance 12
1223withdraw
1224
1225Return
1226value
1227
1228withdraw [parent=f1]
1229amount 5
1230Return
1231value 15
1232withdraw [parent=f1]
1233amount 3
1234Return
1235value 12
1236func make_withdraw(balance)
1237func withdraw(amount) [parent=f1]
1238The second call to withdraw does create a second local frame, as usual. However, both withdraw frames have the same parent. That is, they both extend the environment for make_withdraw, which contains the binding for balance. Hence, they share that particular name binding. Calling withdraw has the side effect of altering the environment that will be extended by future calls to withdraw. The nonlocal statement allows withdraw to change a name binding in the make_withdraw frame.
1239
1240Ever since we first encountered nested def statements, we have observed that a locally defined function can look up names outside of its local frames. No nonlocal statement is required to access a non-local name. By contrast, only after a nonlocal statement can a function change the binding of names in these frames.
1241
1242Practical Guidance. By introducing nonlocal statements, we have created a dual role for assignment statements. Either they change local bindings, or they change nonlocal bindings. In fact, assignment statements already had a dual role: they either created new bindings or re-bound existing names. Assignment can also change the contents of lists and dictionaries. The many roles of Python assignment can obscure the effects of executing an assignment statement. It is up to you as a programmer to document your code clearly so that the effects of assignment can be understood by others.
1243
1244Python Particulars. This pattern of non-local assignment is a general feature of programming languages with higher-order functions and lexical scope. Most other languages do not require a nonlocal statement at all. Instead, non-local assignment is often the default behavior of assignment statements.
1245
1246Python also has an unusual restriction regarding the lookup of names: within the body of a function, all instances of a name must refer to the same frame. As a result, Python cannot look up the value of a name in a non-local frame, then bind that same name in the local frame, because the same name would be accessed in two different frames in the same function. This restriction allows Python to pre-compute which frame contains each name before executing the body of a function. When this restriction is violated, a confusing error message results. To demonstrate, the make_withdraw example is repeated below with the nonlocal statement removed.
1247
12481 def make_withdraw(balance):
12492 def withdraw(amount):
12503 if amount > balance:
12514 return 'Insufficient funds'
12525 balance = balance - amount
12536 return balance
12547 return withdraw
12558
12569 wd = make_withdraw(20)
125710 wd(5)
1258Edit code
1259 < Back Step 8 of 9 Forward >
1260UnboundLocalError: local variable 'balance' referenced before assignment
1261Global frame
1262make_withdraw
1263
1264wd
1265
1266f1: make_withdraw
1267balance 20
1268withdraw
1269
1270Return
1271value
1272
1273withdraw [parent=f1]
1274amount 5
1275func make_withdraw(balance)
1276func withdraw(amount) [parent=f1]
1277This UnboundLocalError appears because balance is assigned locally in line 5, and so Python assumes that all references to balance must appear in the local frame as well. This error occurs before line 5 is ever executed, implying that Python has considered line 5 in some way before executing line 3. As we study interpreter design, we will see that pre-computing facts about a function body before executing it is quite common. In this case, Python's pre-processing restricted the frame in which balance could appear, and thus prevented the name from being found. Adding a nonlocal statement corrects this error. The nonlocal statement did not exist in Python 2.
1278
12792.4.4 The Benefits of Non-Local Assignment
1280
1281Non-local assignment is an important step on our path to viewing a program as a collection of independent and autonomous objects, which interact with each other but each manage their own internal state.
1282
1283In particular, non-local assignment has given us the ability to maintain some state that is local to a function, but evolves over successive calls to that function. The balance associated with a particular withdraw function is shared among all calls to that function. However, the binding for balance associated with an instance of withdraw is inaccessible to the rest of the program. Only wd is associated with the frame for make_withdraw in which it was defined. If make_withdraw is called again, then it will create a separate frame with a separate binding for balance.
1284
1285We can extend our example to illustrate this point. A second call to make_withdraw returns a second withdraw function that has a different parent. We bind this second function to the name wd2 in the global frame.
1286
12871 def make_withdraw(balance):
12882 def withdraw(amount):
12893 nonlocal balance
12904 if amount > balance:
12915 return 'Insufficient funds'
12926 balance = balance - amount
12937 return balance
12948 return withdraw
12959
129610 wd = make_withdraw(20)
129711 wd2 = make_withdraw(7)
129812 wd2(6)
129913 wd(8)
1300Edit code
1301 < Back Program terminated Forward >
1302Global frame
1303make_withdraw
1304
1305wd
1306
1307wd2
1308
1309f1: make_withdraw
1310balance 12
1311withdraw
1312
1313Return
1314value
1315
1316f2: make_withdraw
1317balance 1
1318withdraw
1319
1320Return
1321value
1322
1323withdraw [parent=f2]
1324amount 6
1325Return
1326value 1
1327withdraw [parent=f1]
1328amount 8
1329Return
1330value 12
1331func make_withdraw(balance)
1332func withdraw(amount) [parent=f1]
1333func withdraw(amount) [parent=f2]
1334Now, we see that there are in fact two bindings for the name balance in two different frames, and each withdraw function has a different parent. The name wd is bound to a function with a balance of 20, while wd2 is bound to a different function with a balance of 7.
1335
1336Calling wd2 changes the binding of its non-local balance name, but does not affect the function bound to the name withdraw. A future call to wd is unaffected by the changing balance of wd2; its balance is still 20.
1337
13381 def make_withdraw(balance):
13392 def withdraw(amount):
13403 nonlocal balance
13414 if amount > balance:
13425 return 'Insufficient funds'
13436 balance = balance - amount
13447 return balance
13458 return withdraw
13469
134710 wd = make_withdraw(20)
134811 wd2 = make_withdraw(7)
134912 wd2(6)
135013 wd(8)
1351Edit code
1352 < Back Step 15 of 19 Forward >
1353Global frame
1354make_withdraw
1355
1356wd
1357
1358wd2
1359
1360f1: make_withdraw
1361balance 20
1362withdraw
1363
1364Return
1365value
1366
1367f2: make_withdraw
1368balance 1
1369withdraw
1370
1371Return
1372value
1373
1374withdraw [parent=f2]
1375amount 6
1376Return
1377value 1
1378func make_withdraw(balance)
1379func withdraw(amount) [parent=f1]
1380func withdraw(amount) [parent=f2]
1381In this way, each instance of withdraw maintains its own balance state, but that state is inaccessible to any other function in the program. Viewing this situation at a higher level, we have created an abstraction of a bank account that manages its own internals but behaves in a way that models accounts in the world: it changes over time based on its own history of withdrawal requests.
1382
13832.4.5 The Cost of Non-Local Assignment
1384
1385Our environment model of computation cleanly extends to explain the effects of non-local assignment. However, non-local assignment introduces some important nuances in the way we think about names and values.
1386
1387Previously, our values did not change; only our names and bindings changed. When two names a and b were both bound to the value 4, it did not matter whether they were bound to the same 4 or different 4's. As far as we could tell, there was only one 4 object that never changed.
1388
1389However, functions with state do not behave this way. When two names wd and wd2 are both bound to a withdraw function, it does matter whether they are bound to the same function or different instances of that function. Consider the following example, which contrasts the one we just analyzed. In this case, calling the function named by wd2 did change the value of the function named by wd, because both names refer to the same function.
1390
13911 def make_withdraw(balance):
13922 def withdraw(amount):
13933 nonlocal balance
13944 if amount > balance:
13955 return 'Insufficient funds'
13966 balance = balance - amount
13977 return balance
13988 return withdraw
13999
140010 wd = make_withdraw(12)
140111 wd2 = wd
140212 wd2(1)
140313 wd(1)
1404Edit code
1405 < Back Program terminated Forward >
1406Global frame
1407make_withdraw
1408
1409wd
1410
1411wd2
1412
1413f1: make_withdraw
1414balance 10
1415withdraw
1416
1417Return
1418value
1419
1420withdraw [parent=f1]
1421amount 1
1422Return
1423value 11
1424withdraw [parent=f1]
1425amount 1
1426Return
1427value 10
1428func make_withdraw(balance)
1429func withdraw(amount) [parent=f1]
1430It is not unusual for two names to co-refer to the same value in the world, and so it is in our programs. But, as values change over time, we must be very careful to understand the effect of a change on other names that might refer to those values.
1431
1432The key to correctly analyzing code with non-local assignment is to remember that only function calls can introduce new frames. Assignment statements always change bindings in existing frames. In this case, unless make_withdraw is called twice, there can be only one binding for balance.
1433
1434Sameness and change. These subtleties arise because, by introducing non-pure functions that change the non-local environment, we have changed the nature of expressions. An expression that contains only pure function calls is referentially transparent; its value does not change if we substitute one of its subexpression with the value of that subexpression.
1435
1436Re-binding operations violate the conditions of referential transparency because they do more than return a value; they change the environment. When we introduce arbitrary re-binding, we encounter a thorny epistemological issue: what it means for two values to be the same. In our environment model of computation, two separately defined functions are not the same, because changes to one may not be reflected in the other.
1437
1438In general, so long as we never modify data objects, we can regard a compound data object to be precisely the totality of its pieces. For example, a rational number is determined by giving its numerator and its denominator. But this view is no longer valid in the presence of change, where a compound data object has an "identity" that is something different from the pieces of which it is composed. A bank account is still "the same" bank account even if we change the balance by making a withdrawal; conversely, we could have two bank accounts that happen to have the same balance, but are different objects.
1439
1440Despite the complications it introduces, non-local assignment is a powerful tool for creating modular programs. Different parts of a program, which correspond to different environment frames, can evolve separately throughout program execution. Moreover, using functions with local state, we are able to implement mutable data types. In fact, we can implement abstract data types that are equivalent to the built-in list and dict types introduced above.
1441
14422.4.6 Implementing Lists and Dictionaries
1443
1444Lists are sequences, like tuples. The Python language does not give us access to the implementation of lists, only to the sequence abstraction and the mutation methods we have introduced in this section. To overcome this language-enforced abstraction barrier, we can develop a functional implementation of lists, again using a recursive representation. This section also has a second purpose: to further our understanding of dispatch functions.
1445
1446We will implement a list as a function that has a recursive list as its local state. Lists need to have an identity, like any mutable value. In particular, we cannot use None to represent an empty mutable list, because two empty lists are not identical values (e.g., appending to one does not append to the other), but None is None. On the other hand, two different functions that each have empty_rlist as their local state will suffice to distinguish two empty lists.
1447
1448Our mutable list is a dispatch function, just as our functional implementation of a pair was a dispatch function. It checks the input "message" against known messages and takes an appropriate action for each different input. Our mutable list responds to five different messages. The first two implement the behaviors of the sequence abstraction. The next two add or remove the first element of the list. The final message returns a string representation of the whole list contents.
1449
1450>>> def mutable_rlist():
1451 """Return a functional implementation of a mutable recursive list."""
1452 contents = empty_rlist
1453 def dispatch(message, value=None):
1454 nonlocal contents
1455 if message == 'len':
1456 return len_rlist(contents)
1457 elif message == 'getitem':
1458 return getitem_rlist(contents, value)
1459 elif message == 'push_first':
1460 contents = rlist(value, contents)
1461 elif message == 'pop_first':
1462 f = first(contents)
1463 contents = rest(contents)
1464 return f
1465 elif message == 'str':
1466 return str(contents)
1467 return dispatch
1468We can also add a convenience function to construct a functionally implemented recursive list from any built-in sequence, simply by adding each element in reverse order.
1469
1470>>> def to_mutable_rlist(source):
1471 """Return a functional list with the same contents as source."""
1472 s = mutable_rlist()
1473 for element in reversed(source):
1474 s('push_first', element)
1475 return s
1476In the definition above, the function reversed takes and returns an iterable value; it is another example of a function that uses the conventional interface of sequences.
1477
1478At this point, we can construct a functionally implemented lists. Note that the list itself is a function.
1479
1480>>> s = to_mutable_rlist(suits)
1481>>> type(s)
1482<class 'function'>
1483>>> s('str')
1484"('heart', ('diamond', ('spade', ('club', None))))"
1485In addition, we can pass messages to the list s that change its contents, for instance removing the first element.
1486
1487>>> s('pop_first')
1488'heart'
1489>>> s('str')
1490"('diamond', ('spade', ('club', None)))"
1491In principle, the operations push_first and pop_first suffice to make arbitrary changes to a list. We can always empty out the list entirely and then replace its old contents with the desired result.
1492
1493Message passing. Given some time, we could implement the many useful mutation operations of Python lists, such as extend and insert. We would have a choice: we could implement them all as functions, which use the existing messages pop_first and push_first to make all changes. Alternatively, we could add additional elif clauses to the body of dispatch, each checking for a message (e.g., 'extend') and applying the appropriate change to contents directly.
1494
1495This second approach, which encapsulates the logic for all operations on a data value within one function that responds to different messages, is called message passing. A program that uses message passing defines dispatch functions, each of which may have local state, and organizes computation by passing "messages" as the first argument to those functions. The messages are strings that correspond to particular behaviors.
1496
1497One could imagine that enumerating all of these messages by name in the body of dispatch would become tedious and prone to error. Python dictionaries, introduced in the next section, provide a data type that will help us manage the mapping between messages and operations.
1498
1499Implementing Dictionaries. We can implement an abstract data type that conforms to the dictionary abstraction as a list of records, each of which is a two-element list consisting of a key and the associated value.
1500
1501>>> def dictionary():
1502 """Return a functional implementation of a dictionary."""
1503 records = []
1504 def getitem(key):
1505 for k, v in records:
1506 if k == key:
1507 return v
1508 def setitem(key, value):
1509 for item in records:
1510 if item[0] == key:
1511 item[1] = value
1512 return
1513 records.append([key, value])
1514 def dispatch(message, key=None, value=None):
1515 if message == 'getitem':
1516 return getitem(key)
1517 elif message == 'setitem':
1518 setitem(key, value)
1519 elif message == 'keys':
1520 return tuple(k for k, _ in records)
1521 elif message == 'values':
1522 return tuple(v for _, v in records)
1523 return dispatch
1524Again, we use the message passing method to organize our implementation. We have supported four messages: getitem, setitem, keys, and values. To look up a value for a key, we iterate through the records to find a matching key. To insert a value for a key, we iterate through the records to see if there is already a record with that key. If not, we form a new record. If there already is a record with this key, we set the value of the record to the designated new value.
1525
1526We can now use our implementation to store and retrieve values.
1527
1528>>> d = dictionary()
1529>>> d('setitem', 3, 9)
1530>>> d('setitem', 4, 16)
1531>>> d('getitem', 3)
15329
1533>>> d('getitem', 4)
153416
1535>>> d('keys')
1536(3, 4)
1537>>> d('values')
1538(9, 16)
1539This implementation of a dictionary is not optimized for fast record lookup, because each response to the message 'getitem' must iterate through the entire list of records. The built-in dictionary type is considerably more efficient.
1540
15412.4.7 Dispatch Dictionaries
1542
1543The dispatch function is a general method for implementing a message passing interface for an abstract data type. To implement message dispatch, we have thus far used a large conditional statement to look up function values using message strings.
1544
1545The built-in dictionary data type provides a general method for looking up a value for a key. Instead of using dispatch functions to implement abstract data types, we can use dictionaries with string keys.
1546
1547The mutable account data type below is implemented as a dictionary. It has a constructor account and selector check_balance, as well as functions to deposit or withdraw funds. Moreover, the local state of the account is stored in the dictionary alongside the functions that implement its behavior.
1548
15491 def account(initial_balance):
15502 def deposit(amount):
15513 dispatch['balance'] += amount
15524 return dispatch['balance']
15535 def withdraw(amount):
15546 if amount > dispatch['balance']:
15557 return 'Insufficient funds'
15568 dispatch['balance'] -= amount
15579 return dispatch['balance']
155810 dispatch = {'deposit': deposit,
155911 'withdraw': withdraw,
156012 'balance': initial_balance}
156113 return dispatch
156214
156315 def withdraw(account, amount):
156416 return account['withdraw'](amount)
156517 def deposit(account, amount):
156618 return account['deposit'](amount)
156719 def check_balance(account):
156820 return account['balance']
156921
157022 a = account(20)
157123 deposit(a, 5)
157224 withdraw(a, 17)
157325 check_balance(a)
1574Edit code
1575 < Back Step 13 of 28 Forward >
1576Global frame
1577account
1578
1579withdraw
1580
1581deposit
1582
1583check_balance
1584
1585a
1586
1587f1: account
1588initial_balance 20
1589deposit
1590
1591withdraw
1592
1593dispatch
1594
1595Return
1596value
1597
1598func account(initial_balance)
1599func withdraw(account, amount)
1600func deposit(account, amount)
1601func check_balance(account)
1602func deposit(amount) [parent=f1]
1603func withdraw(amount) [parent=f1]
1604dict
1605"balance" 20
1606"deposit"
1607
1608"withdraw"
1609
1610The name dispatch within the body of the account constructor is bound to a dictionary that contains the messages accepted by an account as keys. The balance is a number, while the messages deposit and withdraw are bound to functions. These functions have access to the dispatch dictionary, and so they can read and change the balance. By storing the balance in the dispatch dictionary rather than in the account frame directly, we avoid the need for nonlocal statements in deposit and withdraw.
1611
1612The operators += and -= are shorthand in Python (and many other languages) for combined lookup and re-assignment. The last two lines below are equivalent.
1613
1614>>> a = 2
1615>>> a = a + 1
1616>>> a += 1
16172.4.8 Example: Propagating Constraints
1618
1619Mutable data allows us to simulate systems with change, but also allows us to build new kinds of abstractions. In this extended example, we combine nonlocal assignment, lists, and dictionaries to build a constraint-based system that supports computation in multiple directions. Expressing programs as constraints is a type of declarative programming, in which a programmer declares the structure of a problem to be solved, but abstracts away the details of exactly how the solution to the problem is computed.
1620
1621Computer programs are traditionally organized as one-directional computations, which perform operations on pre-specified arguments to produce desired outputs. On the other hand, we often want to model systems in terms of relations among quantities. For example, we previously considered the ideal gas law, which relates the pressure (p), volume (v), quantity (n), and temperature (t) of an ideal gas via Boltzmann's constant (k):
1622
1623p * v = n * k * t
1624Such an equation is not one-directional. Given any four of the quantities, we can use this equation to compute the fifth. Yet translating the equation into a traditional computer language would force us to choose one of the quantities to be computed in terms of the other four. Thus, a function for computing the pressure could not be used to compute the temperature, even though the computations of both quantities arise from the same equation.
1625
1626In this section, we sketch the design of a general model of linear relationships. We define primitive constraints that hold between quantities, such as an adder(a, b, c) constraint that enforces the mathematical relationship a + b = c.
1627
1628We also define a means of combination, so that primitive constraints can be combined to express more complex relations. In this way, our program resembles a programming language. We combine constraints by constructing a network in which constraints are joined by connectors. A connector is an object that "holds" a value and may participate in one or more constraints.
1629
1630For example, we know that the relationship between Fahrenheit and Celsius temperatures is:
1631
16329 * c = 5 * (f - 32)
1633This equation is a complex constraint between c and f. Such a constraint can be thought of as a network consisting of primitive adder, multiplier, and constant constraints.
1634
1635
1636In this figure, we see on the left a multiplier box with three terminals, labeled a, b, and c. These connect the multiplier to the rest of the network as follows: The a terminal is linked to a connector celsius, which will hold the Celsius temperature. The b terminal is linked to a connector w, which is also linked to a constant box that holds 9. The c terminal, which the multiplier box constrains to be the product of a and b, is linked to the c terminal of another multiplier box, whose b is connected to a constant 5 and whose a is connected to one of the terms in the sum constraint.
1637
1638Computation by such a network proceeds as follows: When a connector is given a value (by the user or by a constraint box to which it is linked), it awakens all of its associated constraints (except for the constraint that just awakened it) to inform them that it has a value. Each awakened constraint box then polls its connectors to see if there is enough information to determine a value for a connector. If so, the box sets that connector, which then awakens all of its associated constraints, and so on. For instance, in conversion between Celsius and Fahrenheit, w, x, and y are immediately set by the constant boxes to 9, 5, and 32, respectively. The connectors awaken the multipliers and the adder, which determine that there is not enough information to proceed. If the user (or some other part of the network) sets the celsius connector to a value (say 25), the leftmost multiplier will be awakened, and it will set u to 25 * 9 = 225. Then u awakens the second multiplier, which sets v to 45, and v awakens the adder, which sets the fahrenheit connector to 77.
1639
1640Using the Constraint System. To use the constraint system to carry out the temperature computation outlined above, we first create two named connectors, celsius and fahrenheit, by calling the connector constructor.
1641
1642>>> celsius = connector('Celsius')
1643>>> fahrenheit = connector('Fahrenheit')
1644Then, we link these connectors into a network that mirrors the figure above. The function converter assembles the various connectors and constraints in the network.
1645
1646>>> def converter(c, f):
1647 """Connect c to f with constraints to convert from Celsius to Fahrenheit."""
1648 u, v, w, x, y = [connector() for _ in range(5)]
1649 multiplier(c, w, u)
1650 multiplier(v, x, u)
1651 adder(v, y, f)
1652 constant(w, 9)
1653 constant(x, 5)
1654 constant(y, 32)
1655>>> converter(celsius, fahrenheit)
1656We will use a message passing system to coordinate constraints and connectors. Constraints are dictionaries that do not hold local states themselves. Their responses to messages are non-pure functions that change the connectors that they constrain.
1657
1658Connectors are dictionaries that hold a current value and respond to messages that manipulate that value. Constraints will not change the value of connectors directly, but instead will do so by sending messages, so that the connector can notify other constraints in response to the change. In this way, a connector represents a number, but also encapsulates connector behavior.
1659
1660One message we can send to a connector is to set its value. Here, we (the 'user') set the value of celsius to 25.
1661
1662>>> celsius['set_val']('user', 25)
1663Celsius = 25
1664Fahrenheit = 77.0
1665Not only does the value of celsius change to 25, but its value propagates through the network, and so the value of fahrenheit is changed as well. These changes are printed because we named these two connectors when we constructed them.
1666
1667Now we can try to set fahrenheit to a new value, say 212.
1668
1669>>> fahrenheit['set_val']('user', 212)
1670Contradiction detected: 77.0 vs 212
1671The connector complains that it has sensed a contradiction: Its value is 77.0, and someone is trying to set it to 212. If we really want to reuse the network with new values, we can tell celsius to forget its old value:
1672
1673>>> celsius['forget']('user')
1674Celsius is forgotten
1675Fahrenheit is forgotten
1676The connector celsius finds that the user, who set its value originally, is now retracting that value, so celsius agrees to lose its value, and it informs the rest of the network of this fact. This information eventually propagates to fahrenheit, which now finds that it has no reason for continuing to believe that its own value is 77. Thus, it also gives up its value.
1677
1678Now that fahrenheit has no value, we are free to set it to 212:
1679
1680>>> fahrenheit['set_val']('user', 212)
1681Fahrenheit = 212
1682Celsius = 100.0
1683This new value, when propagated through the network, forces celsius to have a value of 100. We have used the very same network to compute celsius given fahrenheit and to compute fahrenheit given celsius. This non-directionality of computation is the distinguishing feature of constraint-based systems.
1684
1685Implementing the Constraint System. As we have seen, connectors are dictionaries that map message names to function and data values. We will implement connectors that respond to the following messages:
1686
1687connector['set_val'](source, value) indicates that the source is requesting the connector to set its current value to value.
1688connector['has_val']() returns whether the connector already has a value.
1689connector['val'] is the current value of the connector.
1690connector['forget'](source) tells the connector that the source is requesting it to forget its value.
1691connector['connect'](source) tells the connector to participate in a new constraint, the source.
1692Constraints are also dictionaries, which receive information from connectors by means of two messages:
1693
1694constraint['new_val']() indicates that some connector that is connected to the constraint has a new value.
1695constraint['forget']() indicates that some connector that is connected to the constraint has forgotten its value.
1696When constraints receive these messages, they propagate them appropriately to other connectors.
1697
1698The adder function constructs an adder constraint over three connectors, where the first two must add to the third: a + b = c. To support multidirectional constraint propagation, the adder must also specify that it subtracts a from c to get b and likewise subtracts b from c to get a.
1699
1700>>> from operator import add, sub
1701>>> def adder(a, b, c):
1702 """The constraint that a + b = c."""
1703 return make_ternary_constraint(a, b, c, add, sub, sub)
1704We would like to implement a generic ternary (three-way) constraint, which uses the three connectors and three functions from adder to create a constraint that accepts new_val and forget messages. The response to messages are local functions, which are placed in a dictionary called constraint.
1705
1706>>> def make_ternary_constraint(a, b, c, ab, ca, cb):
1707 """The constraint that ab(a,b)=c and ca(c,a)=b and cb(c,b) = a."""
1708 def new_value():
1709 av, bv, cv = [connector['has_val']() for connector in (a, b, c)]
1710 if av and bv:
1711 c['set_val'](constraint, ab(a['val'], b['val']))
1712 elif av and cv:
1713 b['set_val'](constraint, ca(c['val'], a['val']))
1714 elif bv and cv:
1715 a['set_val'](constraint, cb(c['val'], b['val']))
1716 def forget_value():
1717 for connector in (a, b, c):
1718 connector['forget'](constraint)
1719 constraint = {'new_val': new_value, 'forget': forget_value}
1720 for connector in (a, b, c):
1721 connector['connect'](constraint)
1722 return constraint
1723The dictionary called constraint is a dispatch dictionary, but also the constraint object itself. It responds to the two messages that constraints receive, but is also passed as the source argument in calls to its connectors.
1724
1725The constraint's local function new_value is called whenever the constraint is informed that one of its connectors has a value. This function first checks to see if both a and b have values. If so, it tells c to set its value to the return value of function ab, which is add in the case of an adder. The constraint passes itself (constraint) as the source argument of the connector, which is the adder object. If a and b do not both have values, then the constraint checks a and c, and so on.
1726
1727If the constraint is informed that one of its connectors has forgotten its value, it requests that all of its connectors now forget their values. (Only those values that were set by this constraint are actually lost.)
1728
1729A multiplier is very similar to an adder.
1730
1731>>> from operator import mul, truediv
1732>>> def multiplier(a, b, c):
1733 """The constraint that a * b = c."""
1734 return make_ternary_constraint(a, b, c, mul, truediv, truediv)
1735A constant is a constraint as well, but one that is never sent any messages, because it involves only a single connector that it sets on construction.
1736
1737>>> def constant(connector, value):
1738 """The constraint that connector = value."""
1739 constraint = {}
1740 connector['set_val'](constraint, value)
1741 return constraint
1742These three constraints are sufficient to implement our temperature conversion network.
1743
1744Representing connectors. A connector is represented as a dictionary that contains a value, but also has response functions with local state. The connector must track the informant that gave it its current value, and a list of constraints in which it participates.
1745
1746The constructor connector has local functions for setting and forgetting values, which are the responses to messages from constraints.
1747
1748>>> def connector(name=None):
1749 """A connector between constraints."""
1750 informant = None
1751 constraints = []
1752 def set_value(source, value):
1753 nonlocal informant
1754 val = connector['val']
1755 if val is None:
1756 informant, connector['val'] = source, value
1757 if name is not None:
1758 print(name, '=', value)
1759 inform_all_except(source, 'new_val', constraints)
1760 else:
1761 if val != value:
1762 print('Contradiction detected:', val, 'vs', value)
1763 def forget_value(source):
1764 nonlocal informant
1765 if informant == source:
1766 informant, connector['val'] = None, None
1767 if name is not None:
1768 print(name, 'is forgotten')
1769 inform_all_except(source, 'forget', constraints)
1770 connector = {'val': None,
1771 'set_val': set_value,
1772 'forget': forget_value,
1773 'has_val': lambda: connector['val'] is not None,
1774 'connect': lambda source: constraints.append(source)}
1775 return connector
1776A connector is again a dispatch dictionary for the five messages used by constraints to communicate with connectors. Four responses are functions, and the final response is the value itself.
1777
1778The local function set_value is called when there is a request to set the connector's value. If the connector does not currently have a value, it will set its value and remember as informant the source constraint that requested the value to be set. Then the connector will notify all of its participating constraints except the constraint that requested the value to be set. This is accomplished using the following iterative function.
1779
1780>>> def inform_all_except(source, message, constraints):
1781 """Inform all constraints of the message, except source."""
1782 for c in constraints:
1783 if c != source:
1784 c[message]()
1785If a connector is asked to forget its value, it calls the local function forget-value, which first checks to make sure that the request is coming from the same constraint that set the value originally. If so, the connector informs its associated constraints about the loss of the value.
1786
1787The response to the message has_val indicates whether the connector has a value. The response to the message connect adds the source constraint to the list of constraints.
1788
1789The constraint program we have designed introduces many ideas that will appear again in object-oriented programming. Constraints and connectors are both abstractions that are manipulated through messages. When the value of a connector is changed, it is changed via a message that not only changes the value, but validates it (checking the source) and propagates its effects (informing other constraints). In fact, we will use a similar architecture of dictionaries with string-valued keys and functional values to implement an object-oriented system later in this chapter.
1790
17912.5 Object-Oriented Programming
1792
1793Object-oriented programming (OOP) is a method for organizing programs that brings together many of the ideas introduced in this chapter. Like abstract data types, objects create an abstraction barrier between the use and implementation of data. Like dispatch dictionaries in message passing, objects respond to behavioral requests. Like mutable data structures, objects have local state that is not directly accessible from the global environment. The Python object system provides new syntax to promote the use of these techniques for organizing programs.
1794
1795But the object system offers more than just convenience; it enables a new metaphor for designing programs in which several independent agents interact within the computer. Each object bundles together local state and behavior in a way that hides the complexity of both behind a data abstraction. Objects communicate with each other, and useful results are computed as a consequence of their interaction. Not only do objects pass messages, they also share behavior among other objects of the same type and inherit characteristics from related types.
1796
1797The paradigm of object-oriented programming has its own vocabulary that reinforces the object metaphor. We have seen that an object is a data value that has methods and attributes, accessible via dot notation. Every object also has a type, called its class. To implement new types of data, we create new classes.
1798
17992.5.1 Objects and Classes
1800
1801A class serves as a template for all objects whose type is that class. Every object is an instance of some particular class. The objects we have used so far all have built-in classes, but new classes can be defined similarly to how new functions can be defined. A class definition specifies the attributes and methods shared among objects of that class. We will introduce the class statement by revisiting the example of a bank account.
1802
1803When introducing local state, we saw that bank accounts are naturally modeled as mutable values that have a balance. A bank account object should have a withdraw method that updates the account balance and returns the requested amount, if it is available. We would like additional behavior to complete the account abstraction: a bank account should be able to return its current balance, return the name of the account holder, and accept deposits.
1804
1805An Account class allows us to create multiple instances of bank accounts. The act of creating a new object instance is known as instantiating the class. The syntax in Python for instantiating a class is identical to the syntax of calling a function. In this case, we call Account with the argument 'Jim', the account holder's name.
1806
1807>>> a = Account('Jim')
1808An attribute of an object is a name-value pair associated with the object, which is accessible via dot notation. The attributes specific to a particular object, as opposed to all objects of a class, are called instance attributes. Each Account has its own balance and account holder name, which are examples of instance attributes. In the broader programming community, instance attributes may also be called fields, properties, or instance variables.
1809
1810>>> a.holder
1811'Jim'
1812>>> a.balance
18130
1814Functions that operate on the object or perform object-specific computations are called methods. The side effects and return value of a method can depend upon, and change, other attributes of the object. For example, deposit is a method of our Account object a. It takes one argument, the amount to deposit, changes the balance attribute of the object, and returns the resulting balance.
1815
1816>>> a.deposit(15)
181715
1818In OOP, we say that methods are invoked on a particular object. As a result of invoking the withdraw method, either the withdrawal is approved and the amount is deducted and returned, or the request is declined and the account prints an error message.
1819
1820>>> a.withdraw(10) # The withdraw method returns the balance after withdrawal
18215
1822>>> a.balance # The balance attribute has changed
18235
1824>>> a.withdraw(10)
1825'Insufficient funds'
1826As illustrated above, the behavior of a method can depend upon the changing attributes of the object. Two calls to withdraw with the same argument return different results.
1827
18282.5.2 Defining Classes
1829
1830User-defined classes are created by class statements, which consist of a single clause. A class statement defines the class name and a base class (discussed in the section on Inheritance), then includes a suite of statements to define the attributes of the class:
1831
1832class <name>(<base class>):
1833 <suite>
1834When a class statement is executed, a new class is created and bound to <name> in the first frame of the current environment. The suite is then executed. Any names bound within the <suite> of a class statement, through def or assignment statements, create or modify attributes of the class.
1835
1836Classes are typically organized around manipulating instance attributes, which are the name-value pairs associated not with the class itself, but with each object of that class. The class specifies the instance attributes of its objects by defining a method for initializing new objects. For instance, part of initializing an object of the Account class is to assign it a starting balance of 0.
1837
1838The <suite> of a class statement contains def statements that define new methods for objects of that class. The method that initializes objects has a special name in Python, __init__ (two underscores on each side of "init"), and is called the constructor for the class.
1839
1840>>> class Account(object):
1841 def __init__(self, account_holder):
1842 self.balance = 0
1843 self.holder = account_holder
1844The __init__ method for Account has two formal parameters. The first one, self, is bound to the newly created Account object. The second parameter, account_holder, is bound to the argument passed to the class when it is called to be instantiated.
1845
1846The constructor binds the instance attribute name balance to 0. It also binds the attribute name holder to the value of the name account_holder. The formal parameter account_holder is a local name to the __init__ method. On the other hand, the name holder that is bound via the final assignment statement persists, because it is stored as an attribute of self using dot notation.
1847
1848Having defined the Account class, we can instantiate it.
1849
1850>>> a = Account('Jim')
1851This "call" to the Account class creates a new object that is an instance of Account, then calls the constructor function __init__ with two arguments: the newly created object and the string 'Jim'. By convention, we use the parameter name self for the first argument of a constructor, because it is bound to the object being instantiated. This convention is adopted in virtually all Python code.
1852
1853Now, we can access the object's balance and holder using dot notation.
1854
1855>>> a.balance
18560
1857>>> a.holder
1858'Jim'
1859Identity. Each new account instance has its own balance attribute, the value of which is independent of other objects of the same class.
1860
1861>>> b = Account('Jack')
1862>>> b.balance = 200
1863>>> [acc.balance for acc in (a, b)]
1864[0, 200]
1865To enforce this separation, every object that is an instance of a user-defined class has a unique identity. Object identity is compared using the is and is not operators.
1866
1867>>> a is a
1868True
1869>>> a is not b
1870True
1871Despite being constructed from identical calls, the objects bound to a and b are not the same. As usual, binding an object to a new name using assignment does not create a new object.
1872
1873>>> c = a
1874>>> c is a
1875True
1876New objects that have user-defined classes are only created when a class (such as Account) is instantiated with call expression syntax.
1877
1878Methods. Object methods are also defined by a def statement in the suite of a class statement. Below, deposit and withdraw are both defined as methods on objects of the Account class.
1879
1880>>> class Account(object):
1881 def __init__(self, account_holder):
1882 self.balance = 0
1883 self.holder = account_holder
1884 def deposit(self, amount):
1885 self.balance = self.balance + amount
1886 return self.balance
1887 def withdraw(self, amount):
1888 if amount > self.balance:
1889 return 'Insufficient funds'
1890 self.balance = self.balance - amount
1891 return self.balance
1892While method definitions do not differ from function definitions in how they are declared, method definitions do have a different effect. The function value that is created by a def statement within a class statement is bound to the declared name, but bound locally within the class as an attribute. That value is invoked as a method using dot notation from an instance of the class.
1893
1894Each method definition again includes a special first parameter self, which is bound to the object on which the method is invoked. For example, let us say that deposit is invoked on a particular Account object and passed a single argument value: the amount deposited. The object itself is bound to self, while the argument is bound to amount. All invoked methods have access to the object via the self parameter, and so they can all access and manipulate the object's state.
1895
1896To invoke these methods, we again use dot notation, as illustrated below.
1897
1898>>> tom_account = Account('Tom')
1899>>> tom_account.deposit(100)
1900100
1901>>> tom_account.withdraw(90)
190210
1903>>> tom_account.withdraw(90)
1904'Insufficient funds'
1905>>> tom_account.holder
1906'Tom'
1907When a method is invoked via dot notation, the object itself (bound to tom_account, in this case) plays a dual role. First, it determines what the name withdraw means; withdraw is not a name in the environment, but instead a name that is local to the Account class. Second, it is bound to the first parameter self when the withdraw method is invoked. The details of the procedure for evaluating dot notation follow in the next section.
1908
19092.5.3 Message Passing and Dot Expressions
1910
1911Methods, which are defined in classes, and instance attributes, which are typically assigned in constructors, are the fundamental elements of object-oriented programming. These two concepts replicate much of the behavior of a dispatch dictionary in a message passing implementation of a data value. Objects take messages using dot notation, but instead of those messages being arbitrary string-valued keys, they are names local to a class. Objects also have named local state values (the instance attributes), but that state can be accessed and manipulated using dot notation, without having to employ nonlocal statements in the implementation.
1912
1913The central idea in message passing was that data values should have behavior by responding to messages that are relevant to the abstract type they represent. Dot notation is a syntactic feature of Python that formalizes the message passing metaphor. The advantage of using a language with a built-in object system is that message passing can interact seamlessly with other language features, such as assignment statements. We do not require different messages to "get" or "set" the value associated with a local attribute name; the language syntax allows us to use the message name directly.
1914
1915Dot expressions. The code fragment tom_account.deposit is called a dot expression. A dot expression consists of an expression, a dot, and a name:
1916
1917<expression> . <name>
1918The <expression> can be any valid Python expression, but the <name> must be a simple name (not an expression that evaluates to a name). A dot expression evaluates to the value of the attribute with the given <name>, for the object that is the value of the <expression>.
1919
1920The built-in function getattr also returns an attribute for an object by name. It is the function equivalent of dot notation. Using getattr, we can look up an attribute using a string, just as we did with a dispatch dictionary.
1921
1922>>> getattr(tom_account, 'balance')
192310
1924We can also test whether an object has a named attribute with hasattr.
1925
1926>>> hasattr(tom_account, 'deposit')
1927True
1928The attributes of an object include all of its instance attributes, along with all of the attributes (including methods) defined in its class. Methods are attributes of the class that require special handling.
1929
1930Method and functions. When a method is invoked on an object, that object is implicitly passed as the first argument to the method. That is, the object that is the value of the <expression> to the left of the dot is passed automatically as the first argument to the method named on the right side of the dot expression. As a result, the object is bound to the parameter self.
1931
1932To achieve automatic self binding, Python distinguishes between functions, which we have been creating since the beginning of the text, and bound methods, which couple together a function and the object on which that method will be invoked. A bound method value is already associated with its first argument, the instance on which it was invoked, which will be named self when the method is called.
1933
1934We can see the difference in the interactive interpreter by calling type on the returned values of dot expressions. As an attribute of a class, a method is just a function, but as an attribute of an instance, it is a bound method:
1935
1936>>> type(Account.deposit)
1937<class 'function'>
1938>>> type(tom_account.deposit)
1939<class 'method'>
1940These two results differ only in the fact that the first is a standard two-argument function with parameters self and amount. The second is a one-argument method, where the name self will be bound to the object named tom_account automatically when the method is called, while the parameter amount will be bound to the argument passed to the method. Both of these values, whether function values or bound method values, are associated with the same deposit function body.
1941
1942We can call deposit in two ways: as a function and as a bound method. In the former case, we must supply an argument for the self parameter explicitly. In the latter case, the self parameter is bound automatically.
1943
1944>>> Account.deposit(tom_account, 1001) # The deposit function requires 2 arguments
19451011
1946>>> tom_account.deposit(1000) # The deposit method takes 1 argument
19472011
1948The function getattr behaves exactly like dot notation: if its first argument is an object but the name is a method defined in the class, then getattr returns a bound method value. On the other hand, if the first argument is a class, then getattr returns the attribute value directly, which is a plain function.
1949
1950Practical guidance: naming conventions. Class names are conventionally written using the CapWords convention (also called CamelCase because the capital letters in the middle of a name are like humps). Method names follow the standard convention of naming functions using lowercased words separated by underscores.
1951
1952In some cases, there are instance variables and methods that are related to the maintenance and consistency of an object that we don't want users of the object to see or use. They are not part of the abstraction defined by a class, but instead part of the implementation. Python's convention dictates that if an attribute name starts with an underscore, it should only be accessed within methods of the class itself, rather than by users of the class.
1953
19542.5.4 Class Attributes
1955
1956Some attribute values are shared across all objects of a given class. Such attributes are associated with the class itself, rather than any individual instance of the class. For instance, let us say that a bank pays interest on the balance of accounts at a fixed interest rate. That interest rate may change, but it is a single value shared across all accounts.
1957
1958Class attributes are created by assignment statements in the suite of a class statement, outside of any method definition. In the broader developer community, class attributes may also be called class variables or static variables. The following class statement creates a class attribute for Account with the name interest.
1959
1960>>> class Account(object):
1961 interest = 0.02 # A class attribute
1962 def __init__(self, account_holder):
1963 self.balance = 0
1964 self.holder = account_holder
1965 # Additional methods would be defined here
1966This attribute can still be accessed from any instance of the class.
1967
1968>>> tom_account = Account('Tom')
1969>>> jim_account = Account('Jim')
1970>>> tom_account.interest
19710.02
1972>>> jim_account.interest
19730.02
1974However, a single assignment statement to a class attribute changes the value of the attribute for all instances of the class.
1975
1976>>> Account.interest = 0.04
1977>>> tom_account.interest
19780.04
1979>>> jim_account.interest
19800.04
1981Attribute names. We have introduced enough complexity into our object system that we have to specify how names are resolved to particular attributes. After all, we could easily have a class attribute and an instance attribute with the same name.
1982
1983As we have seen, a dot expressions consist of an expression, a dot, and a name:
1984
1985<expression> . <name>
1986To evaluate a dot expression:
1987
1988Evaluate the <expression> to the left of the dot, which yields the object of the dot expression.
1989<name> is matched against the instance attributes of that object; if an attribute with that name exists, its value is returned.
1990If <name> does not appear among instance attributes, then <name> is looked up in the class, which yields a class attribute value.
1991That value is returned unless it is a function, in which case a bound method is returned instead.
1992In this evaluation procedure, instance attributes are found before class attributes, just as local names have priority over global in an environment. Methods defined within the class are bound to the object of the dot expression during the third step of this evaluation procedure. The procedure for looking up a name in a class has additional nuances that will arise shortly, once we introduce class inheritance.
1993
1994Assignment. All assignment statements that contain a dot expression on their left-hand side affect attributes for the object of that dot expression. If the object is an instance, then assignment sets an instance attribute. If the object is a class, then assignment sets a class attribute. As a consequence of this rule, assignment to an attribute of an object cannot affect the attributes of its class. The examples below illustrate this distinction.
1995
1996If we assign to the named attribute interest of an account instance, we create a new instance attribute that has the same name as the existing class attribute.
1997
1998>>> jim_account.interest = 0.08
1999and that attribute value will be returned from a dot expression.
2000
2001>>> jim_account.interest
20020.08
2003However, the class attribute interest still retains its original value, which is returned for all other accounts.
2004
2005>>> tom_account.interest
20060.04
2007Changes to the class attribute interest will affect tom_account, but the instance attribute for jim_account will be unaffected.
2008
2009>>> Account.interest = 0.05 # changing the class attribute
2010>>> tom_account.interest # changes instances without like-named instance attributes
20110.05
2012>>> jim_account.interest # but the existing instance attribute is unaffected
20130.08
20142.5.5 Inheritance
2015
2016When working in the OOP paradigm, we often find that different abstract data types are related. In particular, we find that similar classes differ in their amount of specialization. Two classes may have similar attributes, but one represents a special case of the other.
2017
2018For example, we may want to implement a checking account, which is different from a standard account. A checking account charges an extra $1 for each withdrawal and has a lower interest rate. Here, we demonstrate the desired behavior.
2019
2020>>> ch = CheckingAccount('Tom')
2021>>> ch.interest # Lower interest rate for checking accounts
20220.01
2023>>> ch.deposit(20) # Deposits are the same
202420
2025>>> ch.withdraw(5) # withdrawals decrease balance by an extra charge
202614
2027A CheckingAccount is a specialization of an Account. In OOP terminology, the generic account will serve as the base class of CheckingAccount, while CheckingAccount will be a subclass of Account. (The terms parent class and superclass are also used for the base class, while child class is also used for the subclass.)
2028
2029A subclass inherits the attributes of its base class, but may override certain attributes, including certain methods. With inheritance, we only specify what is different between the subclass and the base class. Anything that we leave unspecified in the subclass is automatically assumed to behave just as it would for the base class.
2030
2031Inheritance also has a role in our object metaphor, in addition to being a useful organizational feature. Inheritance is meant to represent is-a relationships between classes, which contrast with has-a relationships. A checking account is-a specific type of account, so having a CheckingAccount inherit from Account is an appropriate use of inheritance. On the other hand, a bank has-a list of bank accounts that it manages, so neither should inherit from the other. Instead, a list of account objects would be naturally expressed as an instance attribute of a bank object.
2032
20332.5.6 Using Inheritance
2034
2035We specify inheritance by putting the base class in parentheses after the class name. First, we give a full implementation of the Account class, which includes docstrings for the class and its methods.
2036
2037>>> class Account(object):
2038 """A bank account that has a non-negative balance."""
2039 interest = 0.02
2040 def __init__(self, account_holder):
2041 self.balance = 0
2042 self.holder = account_holder
2043 def deposit(self, amount):
2044 """Increase the account balance by amount and return the new balance."""
2045 self.balance = self.balance + amount
2046 return self.balance
2047 def withdraw(self, amount):
2048 """Decrease the account balance by amount and return the new balance."""
2049 if amount > self.balance:
2050 return 'Insufficient funds'
2051 self.balance = self.balance - amount
2052 return self.balance
2053A full implementation of CheckingAccount appears below.
2054
2055>>> class CheckingAccount(Account):
2056 """A bank account that charges for withdrawals."""
2057 withdraw_charge = 1
2058 interest = 0.01
2059 def withdraw(self, amount):
2060 return Account.withdraw(self, amount + self.withdraw_charge)
2061Here, we introduce a class attribute withdraw_charge that is specific to the CheckingAccount class. We assign a lower value to the interest attribute. We also define a new withdraw method to override the behavior defined in the Account class. With no further statements in the class suite, all other behavior is inherited from the base class Account.
2062
2063>>> checking = CheckingAccount('Sam')
2064>>> checking.deposit(10)
206510
2066>>> checking.withdraw(5)
20674
2068>>> checking.interest
20690.01
2070The expression checking.deposit evaluates to a bound method for making deposits, which was defined in the Account class. When Python resolves a name in a dot expression that is not an attribute of the instance, it looks up the name in the class. In fact, the act of "looking up" a name in a class tries to find that name in every base class in the inheritance chain for the original object's class. We can define this procedure recursively. To look up a name in a class.
2071
2072If it names an attribute in the class, return the attribute value.
2073Otherwise, look up the name in the base class, if there is one.
2074In the case of deposit, Python would have looked for the name first on the instance, and then in the CheckingAccount class. Finally, it would look in the Account class, where deposit is defined. According to our evaluation rule for dot expressions, since deposit is a function looked up in the class for the checking instance, the dot expression evaluates to a bound method value. That method is invoked with the argument 10, which calls the deposit method with self bound to the checking object and amount bound to 10.
2075
2076The class of an object stays constant throughout. Even though the deposit method was found in the Account class, deposit is called with self bound to an instance of CheckingAccount, not of Account.
2077
2078Calling ancestors. Attributes that have been overridden are still accessible via class objects. For instance, we implemented the withdraw method of CheckingAccount by calling the withdraw method of Account with an argument that included the withdraw_charge.
2079
2080Notice that we called self.withdraw_charge rather than the equivalent CheckingAccount.withdraw_charge. The benefit of the former over the latter is that a class that inherits from CheckingAccount might override the withdrawal charge. If that is the case, we would like our implementation of withdraw to find that new value instead of the old one.
2081
2082Object Abstractions. It is extremely common in object-oriented programs that different types of objects will share the same attribute names. An object abstraction is a collection of attributes and conditions on those attributes. For example, all accounts must have deposit and withdraw methods that take numerical arguments, as well as a balance attribute. The classes Account and CheckingAccount both implement this abstraction. Inheritance specifically promotes name sharing in this way.
2083
2084The parts of your program that use objects (rather than implementing them) are most robust to future changes if they do not make assumptions about object types, but instead only about their attribute names. That is, they use the object abstraction, rather than assuming anything about its implementation. For example, let us say that we run a lottery, and we wish to deposit $5 into each of a list of accounts. The following implementation does not assume anything about the types of those accounts, and therefore works equally well with any type of object that has a deposit method:
2085
2086>>> def deposit_all(winners, amount=5):
2087 for account in winners:
2088 account.deposit(amount)
2089The function deposit_all above assumes only that each account satisfies the account object abstraction, and so it will work with any other account classes that also implement this abstraction. Assuming a particular class of account would violate the abstraction barrier of the account object abstraction. For example, the following implementation will not necessarily work with new kinds of accounts:
2090
2091>>> def deposit_all(winners, amount=5):
2092 for account in winners:
2093 Account.deposit(account, amount)
20942.5.7 Multiple Inheritance
2095
2096Python supports the concept of a subclass inheriting attributes from multiple base classes, a language feature called multiple inheritance.
2097
2098Suppose that we have a SavingsAccount that inherits from Account, but charges customers a small fee every time they make a deposit.
2099
2100>>> class SavingsAccount(Account):
2101 deposit_charge = 2
2102 def deposit(self, amount):
2103 return Account.deposit(self, amount - self.deposit_charge)
2104Then, a clever executive conceives of an AsSeenOnTVAccount account with the best features of both CheckingAccount and SavingsAccount: withdrawal fees, deposit fees, and a low interest rate. It's both a checking and a savings account in one! "If we build it," the executive reasons, "someone will sign up and pay all those fees. We'll even give them a dollar."
2105
2106>>> class AsSeenOnTVAccount(CheckingAccount, SavingsAccount):
2107 def __init__(self, account_holder):
2108 self.holder = account_holder
2109 self.balance = 1 # A free dollar!
2110In fact, this implementation is complete. Both withdrawal and deposits will generate fees, using the function definitions in CheckingAccount and SavingsAccount respectively.
2111
2112>>> such_a_deal = AsSeenOnTVAccount("John")
2113>>> such_a_deal.balance
21141
2115>>> such_a_deal.deposit(20) # $2 fee from SavingsAccount.deposit
211619
2117>>> such_a_deal.withdraw(5) # $1 fee from CheckingAccount.withdraw
211813
2119Non-ambiguous references are resolved correctly as expected:
2120
2121>>> such_a_deal.deposit_charge
21222
2123>>> such_a_deal.withdraw_charge
21241
2125But what about when the reference is ambiguous, such as the reference to the withdraw method that is defined in both Account and CheckingAccount? The figure below depicts an inheritance graph for the AsSeenOnTVAccount class. Each arrow points from a subclass to a base class.
2126
2127
2128For a simple "diamond" shape like this, Python resolves names from left to right, then upwards. In this example, Python checks for an attribute name in the following classes, in order, until an attribute with that name is found:
2129
2130AsSeenOnTVAccount, CheckingAccount, SavingsAccount, Account, object
2131There is no correct solution to the inheritance ordering problem, as there are cases in which we might prefer to give precedence to certain inherited classes over others. However, any programming language that supports multiple inheritance must select some ordering in a consistent way, so that users of the language can predict the behavior of their programs.
2132
2133Further reading. Python resolves this name using a recursive algorithm called the C3 Method Resolution Ordering. The method resolution order of any class can be queried using the mro method on all classes.
2134
2135>>> [c.__name__ for c in AsSeenOnTVAccount.mro()]
2136['AsSeenOnTVAccount', 'CheckingAccount', 'SavingsAccount', 'Account', 'object']
2137The precise algorithm for finding method resolution orderings is not a topic for this text, but is described by Python's primary author with a reference to the original paper.
2138
21392.5.8 The Role of Objects
2140
2141The Python object system is designed to make data abstraction and message passing both convenient and flexible. The specialized syntax of classes, methods, inheritance, and dot expressions all enable us to formalize the object metaphor in our programs, which improves our ability to organize large programs.
2142
2143In particular, we would like our object system to promote a separation of concerns among the different aspects of the program. Each object in a program encapsulates and manages some part of the program's state, and each class statement defines the functions that implement some part of the program's overall logic. Abstraction barriers enforce the boundaries between different aspects of a large program.
2144
2145Object-oriented programming is particularly well-suited to programs that model systems that have separate but interacting parts. For instance, different users interact in a social network, different characters interact in a game, and different shapes interact in a physical simulation. When representing such systems, the objects in a program often map naturally onto objects in the system being modeled, and classes represent their types and relationships.
2146
2147On the other hand, classes may not provide the best mechanism for implementing certain abstractions. Functional abstractions provide a more natural metaphor for representing relationships between inputs and outputs. One should not feel compelled to fit every bit of logic in a program within a class, especially when defining independent functions for manipulating data is more natural. Functions can also enforce a separation of concerns.
2148
2149Multi-paradigm languages like Python allow programmers to match organizational paradigms to appropriate problems. Learning to identify when to introduce a new class, as opposed to a new function, in order to simplify or modularize a program, is an important design skill in software engineering that deserves careful attention.
2150
21512.6 Implementing Classes and Objects
2152
2153When working in the object-oriented programming paradigm, we use the object metaphor to guide the organization of our programs. Most logic about how to represent and manipulate data is expressed within class declarations. In this section, we see that classes and objects can themselves be represented using just functions and dictionaries. The purpose of implementing an object system in this way is to illustrate that using the object metaphor does not require a special programming language. Programs can be object-oriented, even in programming languages that do not have a built-in object system.
2154
2155In order to implement objects, we will abandon dot notation (which does require built-in language support), but create dispatch dictionaries that behave in much the same way as the elements of the built-in object system. We have already seen how to implement message-passing behavior through dispatch dictionaries. To implement an object system in full, we send messages between instances, classes, and base classes, all of which are dictionaries that contain attributes.
2156
2157We will not implement the entire Python object system, which includes features that we have not covered in this text (e.g., meta-classes and static methods). We will focus instead on user-defined classes without multiple inheritance and without introspective behavior (such as returning the class of an instance). Our implementation is not meant to follow the precise specification of the Python type system. Instead, it is designed to implement the core functionality that enables the object metaphor.
2158
21592.6.1 Instances
2160
2161We begin with instances. An instance has named attributes, such as the balance of an account, which can be set and retrieved. We implement an instance using a dispatch dictionary that responds to messages that "get" and "set" attribute values. Attributes themselves are stored in a local dictionary called attributes.
2162
2163As we have seen previously in this chapter, dictionaries themselves are abstract data types. We implemented dictionaries with lists, we implemented lists with pairs, and we implemented pairs with functions. As we implement an object system in terms of dictionaries, keep in mind that we could just as well be implementing objects using functions alone.
2164
2165To begin our implementation, we assume that we have a class implementation that can look up any names that are not part of the instance. We pass in a class to make_instance as the parameter cls.
2166
2167>>> def make_instance(cls):
2168 """Return a new object instance, which is a dispatch dictionary."""
2169 def get_value(name):
2170 if name in attributes:
2171 return attributes[name]
2172 else:
2173 value = cls['get'](name)
2174 return bind_method(value, instance)
2175 def set_value(name, value):
2176 attributes[name] = value
2177 attributes = {}
2178 instance = {'get': get_value, 'set': set_value}
2179 return instance
2180The instance is a dispatch dictionary that responds to the messages get and set. The set message corresponds to attribute assignment in Python's object system: all assigned attributes are stored directly within the object's local attribute dictionary. In get, if name does not appear in the local attributes dictionary, then it is looked up in the class. If the value returned by cls is a function, it must be bound to the instance.
2181
2182Bound method values. The get_value function in make_instance finds a named attribute in its class with get, then calls bind_method. Binding a method only applies to function values, and it creates a bound method value from a function value by inserting the instance as the first argument:
2183
2184>>> def bind_method(value, instance):
2185 """Return a bound method if value is callable, or value otherwise."""
2186 if callable(value):
2187 def method(*args):
2188 return value(instance, *args)
2189 return method
2190 else:
2191 return value
2192When a method is called, the first parameter self will be bound to the value of instance by this definition.
2193
21942.6.2 Classes
2195
2196A class is also an object, both in Python's object system and the system we are implementing here. For simplicity, we say that classes do not themselves have a class. (In Python, classes do have classes; almost all classes share the same class, called type.) A class can respond to get and set messages, as well as the new message:
2197
2198>>> def make_class(attributes, base_class=None):
2199 """Return a new class, which is a dispatch dictionary."""
2200 def get_value(name):
2201 if name in attributes:
2202 return attributes[name]
2203 elif base_class is not None:
2204 return base_class['get'](name)
2205 def set_value(name, value):
2206 attributes[name] = value
2207 def new(*args):
2208 return init_instance(cls, *args)
2209 cls = {'get': get_value, 'set': set_value, 'new': new}
2210 return cls
2211Unlike an instance, the get function for classes does not query its class when an attribute is not found, but instead queries its base_class. No method binding is required for classes.
2212
2213Initialization. The new function in make_class calls init_instance, which first makes a new instance, then invokes a method called __init__.
2214
2215>>> def init_instance(cls, *args):
2216 """Return a new object with type cls, initialized with args."""
2217 instance = make_instance(cls)
2218 init = cls['get']('__init__')
2219 if init:
2220 init(instance, *args)
2221 return instance
2222This final function completes our object system. We now have instances, which set locally but fall back to their classes on get. After an instance looks up a name in its class, it binds itself to function values to create methods. Finally, classes can create new instances, and they apply their __init__ constructor function immediately after instance creation.
2223
2224In this object system, the only function that should be called by the user is create_class. All other functionality is enabled through message passing. Similarly, Python's object system is invoked via the class statement, and all of its other functionality is enabled through dot expressions and calls to classes.
2225
22262.6.3 Using Implemented Objects
2227
2228We now return to use the bank account example from the previous section. Using our implemented object system, we will create an Account class, a CheckingAccount subclass, and an instance of each.
2229
2230The Account class is created through a create_account_class function, which has structure similar to a class statement in Python, but concludes with a call to make_class.
2231
2232>>> def make_account_class():
2233 """Return the Account class, which has deposit and withdraw methods."""
2234 def __init__(self, account_holder):
2235 self['set']('holder', account_holder)
2236 self['set']('balance', 0)
2237 def deposit(self, amount):
2238 """Increase the account balance by amount and return the new balance."""
2239 new_balance = self['get']('balance') + amount
2240 self['set']('balance', new_balance)
2241 return self['get']('balance')
2242 def withdraw(self, amount):
2243 """Decrease the account balance by amount and return the new balance."""
2244 balance = self['get']('balance')
2245 if amount > balance:
2246 return 'Insufficient funds'
2247 self['set']('balance', balance - amount)
2248 return self['get']('balance')
2249 return make_class(locals())
2250The final call to locals returns a dictionary with string keys that contains the name-value bindings in the current local frame.
2251
2252The Account class is finally instantiated via assignment.
2253
2254>>> Account = make_account_class()
2255Then, an account instance is created via the new message, which requires a name to go with the newly created account.
2256
2257>>> jim_acct = Account['new']('Jim')
2258Then, get messages passed to jim_acct retrieve properties and methods. Methods can be called to update the balance of the account.
2259
2260>>> jim_acct['get']('holder')
2261'Jim'
2262>>> jim_acct['get']('interest')
22630.02
2264>>> jim_acct['get']('deposit')(20)
226520
2266>>> jim_acct['get']('withdraw')(5)
226715
2268As with the Python object system, setting an attribute of an instance does not change the corresponding attribute of its class.
2269
2270>>> jim_acct['set']('interest', 0.04)
2271>>> Account['get']('interest')
22720.02
2273Inheritance. We can create a subclass CheckingAccount by overloading a subset of the class attributes. In this case, we change the withdraw method to impose a fee, and we reduce the interest rate.
2274
2275>>> def make_checking_account_class():
2276 """Return the CheckingAccount class, which imposes a $1 withdrawal fee."""
2277 interest = 0.01
2278 withdraw_fee = 1
2279 def withdraw(self, amount):
2280 fee = self['get']('withdraw_fee')
2281 return Account['get']('withdraw')(self, amount + fee)
2282 return make_class(locals(), Account)
2283In this implementation, we call the withdraw function of the base class Account from the withdraw function of the subclass, as we would in Python's built-in object system. We can create the subclass itself and an instance, as before.
2284
2285>>> CheckingAccount = make_checking_account_class()
2286>>> jack_acct = CheckingAccount['new']('Jack')
2287Deposits behave identically, as does the constructor function. withdrawals impose the $1 fee from the specialized withdraw method, and interest has the new lower value from CheckingAccount.
2288
2289>>> jack_acct['get']('interest')
22900.01
2291>>> jack_acct['get']('deposit')(20)
229220
2293>>> jack_acct['get']('withdraw')(5)
229414
2295Our object system built upon dictionaries is quite similar in implementation to the built-in object system in Python. In Python, an instance of any user-defined class has a special attribute __dict__ that stores the local instance attributes for that object in a dictionary, much like our attributes dictionary. Python differs because it distinguishes certain special methods that interact with built-in functions to ensure that those functions behave correctly for arguments of many different types. Functions that operate on different types are the subject of the next section.
2296
22972.7 Generic Operations
2298
2299In this chapter, we introduced compound data values, along with the technique of data abstraction using constructors and selectors. Using message passing, we endowed our abstract data types with behavior directly. Using the object metaphor, we bundled together the representation of data and the methods used to manipulate that data to modularize data-driven programs with local state.
2300
2301However, we have yet to show that our object system allows us to combine together different types of objects flexibly in a large program. Message passing via dot expressions is only one way of building combined expressions with multiple objects. In this section, we explore alternate methods for combining and manipulating objects of different types.
2302
23032.7.1 String Conversion
2304
2305We stated in the beginning of this chapter that an object value should behave like the kind of data it is meant to represent, including producing a string representation of itself. String representations of data values are especially important in an interactive language like Python, where the read-eval-print loop requires every value to have some sort of string representation.
2306
2307String values provide a fundamental medium for communicating information among humans. Sequences of characters can be rendered on a screen, printed to paper, read aloud, converted to braille, or broadcast as Morse code. Strings are also fundamental to programming because they can represent Python expressions. For an object, we may want to generate a string that, when interpreted as a Python expression, evaluates to an equivalent object.
2308
2309Python stipulates that all objects should produce two different string representations: one that is human-interpretable text and one that is a Python-interpretable expression. The constructor function for strings, str, returns a human-readable string. Where possible, the repr function returns a Python expression that evaluates to an equal object. The docstring for repr explains this property:
2310
2311repr(object) -> string
2312
2313Return the canonical string representation of the object.
2314For most object types, eval(repr(object)) == object.
2315The result of calling repr on the value of an expression is what Python prints in an interactive session.
2316
2317>>> 12e12
231812000000000000.0
2319>>> print(repr(12e12))
232012000000000000.0
2321In cases where no representation exists that evaluates to the original value, Python produces a proxy.
2322
2323>>> repr(min)
2324'<built-in function min>'
2325The str constructor often coincides with repr, but provides a more interpretable text representation in some cases. For instance, we see a difference between str and repr with dates.
2326
2327>>> from datetime import date
2328>>> today = date(2011, 9, 12)
2329>>> repr(today)
2330'datetime.date(2011, 9, 12)'
2331>>> str(today)
2332'2011-09-12'
2333Defining the repr function presents a new challenge: we would like it to apply correctly to all data types, even those that did not exist when repr was implemented. We would like it to be a polymorphic function, one that can be applied to many (poly) different forms (morph) of data.
2334
2335Message passing provides an elegant solution in this case: the repr function invokes a method called __repr__ on its argument.
2336
2337>>> today.__repr__()
2338'datetime.date(2011, 9, 12)'
2339By implementing this same method in user-defined classes, we can extend the applicability of repr to any class we create in the future. This example highlights another benefit of message passing in general, that it provides a mechanism for extending the domain of existing functions to new object types.
2340
2341The str constructor is implemented in a similar manner: it invokes a method called __str__ on its argument.
2342
2343>>> today.__str__()
2344'2011-09-12'
2345These polymorphic functions are examples of a more general principle: certain functions should apply to multiple data types. The message passing approach exemplified here is only one of a family of techniques for implementing polymorphic functions. The remainder of this section explores some alternatives.
2346
23472.7.2 Multiple Representations
2348
2349Data abstraction, using objects or functions, is a powerful tool for managing complexity. Abstract data types allow us to construct an abstraction barrier between the underlying representation of data and the functions or messages used to manipulate it. However, in large programs, it may not always make sense to speak of "the underlying representation" for a data type in a program. For one thing, there might be more than one useful representation for a data object, and we might like to design systems that can deal with multiple representations.
2350
2351To take a simple example, complex numbers may be represented in two almost equivalent ways: in rectangular form (real and imaginary parts) and in polar form (magnitude and angle). Sometimes the rectangular form is more appropriate and sometimes the polar form is more appropriate. Indeed, it is perfectly plausible to imagine a system in which complex numbers are represented in both ways, and in which the functions for manipulating complex numbers work with either representation.
2352
2353More importantly, large software systems are often designed by many people working over extended periods of time, subject to requirements that change over time. In such an environment, it is simply not possible for everyone to agree in advance on choices of data representation. In addition to the data-abstraction barriers that isolate representation from use, we need abstraction barriers that isolate different design choices from each other and permit different choices to coexist in a single program. Furthermore, since large programs are often created by combining pre-existing modules that were designed in isolation, we need conventions that permit programmers to incorporate modules into larger systems additively, that is, without having to redesign or re-implement these modules.
2354
2355We begin with the simple complex-number example. We will see how message passing enables us to design separate rectangular and polar representations for complex numbers while maintaining the notion of an abstract "complex-number" object. We will accomplish this by defining arithmetic functions for complex numbers (add_complex, mul_complex) in terms of generic selectors that access parts of a complex number independent of how the number is represented. The resulting complex-number system contains two different kinds of abstraction barriers. They isolate higher-level operations from lower-level representations. In addition, there is a vertical barrier that gives us the ability to separately design alternative representations.
2356
2357
2358As a side note, we are developing a system that performs arithmetic operations on complex numbers as a simple but unrealistic example of a program that uses generic operations. A complex number type is actually built into Python, but for this example we will implement our own.
2359
2360Like rational numbers, complex numbers are naturally represented as pairs. The set of complex numbers can be thought of as a two-dimensional space with two orthogonal axes, the real axis and the imaginary axis. From this point of view, the complex number z = x + y * i (where i*i = -1) can be thought of as the point in the plane whose real coordinate is x and whose imaginary coordinate is y. Adding complex numbers involves adding their respective x and y coordinates.
2361
2362When multiplying complex numbers, it is more natural to think in terms of representing a complex number in polar form, as a magnitude and an angle. The product of two complex numbers is the vector obtained by stretching one complex number by a factor of the length of the other, and then rotating it through the angle of the other.
2363
2364Thus, there are two different representations for complex numbers, which are appropriate for different operations. Yet, from the viewpoint of someone writing a program that uses complex numbers, the principle of data abstraction suggests that all the operations for manipulating complex numbers should be available regardless of which representation is used by the computer.
2365
2366Interfaces. Message passing not only provides a method for coupling behavior and data, it allows different data types to respond to the same message in different ways. A shared message that elicits similar behavior from different object classes is a powerful method of abstraction.
2367
2368As we have seen, an abstract data type is defined by constructors, selectors, and additional behavior conditions. A closely related concept is an interface, which is a set of shared messages, along with a specification of what they mean. Objects that respond to the special __repr__ and __str__ methods all implement a common interface of types that can be represented as strings.
2369
2370In the case of complex numbers, the interface needed to implement arithmetic consists of four messages: real, imag, magnitude, and angle. We can implement addition and multiplication in terms of these messages.
2371
2372We can have two different abstract data types for complex numbers that differ in their constructors.
2373
2374ComplexRI constructs a complex number from real and imaginary parts.
2375ComplexMA constructs a complex number from a magnitude and angle.
2376With these messages and constructors, we can implement complex arithmetic.
2377
2378>>> def add_complex(z1, z2):
2379 return ComplexRI(z1.real + z2.real, z1.imag + z2.imag)
2380>>> def mul_complex(z1, z2):
2381 return ComplexMA(z1.magnitude * z2.magnitude, z1.angle + z2.angle)
2382The relationship between the terms "abstract data type" (ADT) and "interface" is subtle. An ADT includes ways of building complex data types, manipulating them as units, and selecting for their components. In an object-oriented system, an ADT corresponds to a class, although we have seen that an object system is not needed to implement an ADT. An interface is a set of messages that have associated meanings, and which may or may not include selectors. Conceptually, an ADT describes a full representational abstraction of some kind of thing, whereas an interface specifies a set of behaviors that may be shared across many things.
2383
2384Properties. We would like to use both types of complex numbers interchangeably, but it would be wasteful to store redundant information about each number. We would like to store either the real-imaginary representation or the magnitude-angle representation.
2385
2386Python has a simple feature for computing attributes on the fly from zero-argument functions. The @property decorator allows functions to be called without the standard call expression syntax. An implementation of complex numbers in terms of real and imaginary parts illustrates this point.
2387
2388>>> from math import atan2
2389>>> class ComplexRI(object):
2390 def __init__(self, real, imag):
2391 self.real = real
2392 self.imag = imag
2393 @property
2394 def magnitude(self):
2395 return (self.real ** 2 + self.imag ** 2) ** 0.5
2396 @property
2397 def angle(self):
2398 return atan2(self.imag, self.real)
2399 def __repr__(self):
2400 return 'ComplexRI({0}, {1})'.format(self.real, self.imag)
2401A second implementation using magnitude and angle provides the same interface because it responds to the same set of messages.
2402
2403>>> from math import sin, cos
2404>>> class ComplexMA(object):
2405 def __init__(self, magnitude, angle):
2406 self.magnitude = magnitude
2407 self.angle = angle
2408 @property
2409 def real(self):
2410 return self.magnitude * cos(self.angle)
2411 @property
2412 def imag(self):
2413 return self.magnitude * sin(self.angle)
2414 def __repr__(self):
2415 return 'ComplexMA({0}, {1})'.format(self.magnitude, self.angle)
2416In fact, our implementations of add_complex and mul_complex are now complete; either class of complex number can be used for either argument in either complex arithmetic function. It is worth noting that the object system does not explicitly connect the two complex types in any way (e.g., through inheritance). We have implemented the complex number abstraction by sharing a common set of messages, an interface, across the two classes.
2417
2418>>> from math import pi
2419>>> add_complex(ComplexRI(1, 2), ComplexMA(2, pi/2))
2420ComplexRI(1.0000000000000002, 4.0)
2421>>> mul_complex(ComplexRI(0, 1), ComplexRI(0, 1))
2422ComplexMA(1.0, 3.141592653589793)
2423The interface approach to encoding multiple representations has appealing properties. The class for each representation can be developed separately; they must only agree on the names of the attributes they share. The interface is also additive. If another programmer wanted to add a third representation of complex numbers to the same program, they would only have to create another class with the same attributes.
2424
2425Special methods. The built-in mathematical operators can be extended in much the same way as repr; there are special method names corresponding to Python operators for arithmetic, logical, and sequence operations.
2426
2427To make our code more legible, we would perhaps like to use the + and * operators directly when adding and multiplying complex numbers. Adding the following methods to both of our complex number classes will enable these operators to be used, as well as the add and mul functions in the operator module:
2428
2429>>> ComplexRI.__add__ = lambda self, other: add_complex(self, other)
2430>>> ComplexMA.__add__ = lambda self, other: add_complex(self, other)
2431>>> ComplexRI.__mul__ = lambda self, other: mul_complex(self, other)
2432>>> ComplexMA.__mul__ = lambda self, other: mul_complex(self, other)
2433Now, we can use infix notation with our user-defined classes.
2434
2435>>> ComplexRI(1, 2) + ComplexMA(2, 0)
2436ComplexRI(3.0, 2.0)
2437>>> ComplexRI(0, 1) * ComplexRI(0, 1)
2438ComplexMA(1.0, 3.141592653589793)
2439Further reading. To evaluate expressions that contain the + operator, Python checks for special methods on both the left and right operands of the expression. First, Python checks for an __add__ method on the value of the left operand, then checks for an __radd__ method on the value of the right operand. If either is found, that method is invoked with the value of the other operand as its argument.
2440
2441Similar protocols exist for evaluating expressions that contain any kind of operator in Python, including slice notation and Boolean operators. The Python docs list the exhaustive set of method names for operators. Dive into Python 3 has a chapter on special method names that describes many details of their use in the Python interpreter.
2442
24432.7.3 Generic Functions
2444
2445Our implementation of complex numbers has made two data types interchangeable as arguments to the add_complex and mul_complex functions. Now we will see how to use this same idea not only to define operations that are generic over different representations but also to define operations that are generic over different kinds of arguments that do not share a common interface.
2446
2447The operations we have defined so far treat the different data types as being completely independent. Thus, there are separate packages for adding, say, two rational numbers, or two complex numbers. What we have not yet considered is the fact that it is meaningful to define operations that cross the type boundaries, such as the addition of a complex number to a rational number. We have gone to great pains to introduce barriers between parts of our programs so that they can be developed and understood separately.
2448
2449We would like to introduce the cross-type operations in some carefully controlled way, so that we can support them without seriously violating our abstraction boundaries. There is a tension between the outcomes we desire: we would like to be able to add a complex number to a rational number, and we would like to do so using a generic add function that does the right thing with all numeric types. At the same time, we would like to separate the concerns of complex numbers and rational numbers whenever possible, in order to maintain a modular program.
2450
2451Let us revise our implementation of rational numbers to use Python's built-in object system. As before, we will store a rational number as a numerator and denominator in lowest terms.
2452
2453>>> from fractions import gcd
2454>>> class Rational(object):
2455 def __init__(self, numer, denom):
2456 g = gcd(numer, denom)
2457 self.numer = numer // g
2458 self.denom = denom // g
2459 def __repr__(self):
2460 return 'Rational({0}, {1})'.format(self.numer, self.denom)
2461Adding and multiplying rational numbers in this new implementation is similar to before.
2462
2463>>> def add_rationals(x, y):
2464 nx, dx = x.numer, x.denom
2465 ny, dy = y.numer, y.denom
2466 return Rational(nx * dy + ny * dx, dx * dy)
2467>>> def mul_rationals(x, y):
2468 return Rational(x.numer * y.numer, x.denom * y.denom)
2469Type dispatching. One way to handle cross-type operations is to design a different function for each possible combination of types for which the operation is valid. For example, we could extend our complex number implementation so that it provides a function for adding complex numbers to rational numbers. We can provide this functionality generically using a technique called dispatching on type.
2470
2471The idea of type dispatching is to write functions that first inspect the type of argument they have received, and then execute code that is appropriate for the type. In Python, the type of an object can be inspected with the built-in type function.
2472
2473>>> def iscomplex(z):
2474 return type(z) in (ComplexRI, ComplexMA)
2475>>> def isrational(z):
2476 return type(z) == Rational
2477In this case, we are relying on the fact that each object knows its type, and we can look up that type using the Python type function. Even if the type function were not available, we could imagine implementing iscomplex and isrational in terms of a shared class attribute for Rational, ComplexRI, and ComplexMA.
2478
2479Now consider the following implementation of add, which explicitly checks the type of both arguments. We will not use Python's special methods (i.e., __add__) in this example.
2480
2481>>> def add_complex_and_rational(z, r):
2482 return ComplexRI(z.real + r.numer/r.denom, z.imag)
2483>>> def add(z1, z2):
2484 """Add z1 and z2, which may be complex or rational."""
2485 if iscomplex(z1) and iscomplex(z2):
2486 return add_complex(z1, z2)
2487 elif iscomplex(z1) and isrational(z2):
2488 return add_complex_and_rational(z1, z2)
2489 elif isrational(z1) and iscomplex(z2):
2490 return add_complex_and_rational(z2, z1)
2491 else:
2492 return add_rationals(z1, z2)
2493This simplistic approach to type dispatching, which uses a large conditional statement, is not additive. If another numeric type were included in the program, we would have to re-implement add with new clauses.
2494
2495We can create a more flexible implementation of add by implementing type dispatch through a dictionary. The first step in extending the flexibility of add will be to create a tag set for our classes that abstracts away from the two implementations of complex numbers.
2496
2497>>> def type_tag(x):
2498 return type_tag.tags[type(x)]
2499>>> type_tag.tags = {ComplexRI: 'com', ComplexMA: 'com', Rational: 'rat'}
2500Next, we use these type tags to index a dictionary that stores the different ways of adding numbers. The keys of the dictionary are tuples of type tags, and the values are type-specific addition functions.
2501
2502>>> def add(z1, z2):
2503 types = (type_tag(z1), type_tag(z2))
2504 return add.implementations[types](z1, z2)
2505This definition of add does not have any functionality itself; it relies entirely on a dictionary called add.implementations to implement addition. We can populate that dictionary as follows.
2506
2507>>> add.implementations = {}
2508>>> add.implementations[('com', 'com')] = add_complex
2509>>> add.implementations[('com', 'rat')] = add_complex_and_rational
2510>>> add.implementations[('rat', 'com')] = lambda x, y: add_complex_and_rational(y, x)
2511>>> add.implementations[('rat', 'rat')] = add_rationals
2512This dictionary-based approach to dispatching is additive, because add.implementations and type_tag.tags can always be extended. Any new numeric type can "install" itself into the existing system by adding new entries to these dictionaries.
2513
2514While we have introduced some complexity to the system, we now have a generic, extensible add function that handles mixed types.
2515
2516>>> add(ComplexRI(1.5, 0), Rational(3, 2))
2517ComplexRI(3.0, 0)
2518>>> add(Rational(5, 3), Rational(1, 2))
2519Rational(13, 6)
2520Data-directed programming. Our dictionary-based implementation of add is not addition-specific at all; it does not contain any direct addition logic. It only implements addition because we happen to have populated its implementations dictionary with functions that perform addition.
2521
2522A more general version of generic arithmetic would apply arbitrary operators to arbitrary types and use a dictionary to store implementations of various combinations. This fully generic approach to implementing methods is called data-directed programming. In our case, we can implement both generic addition and multiplication without redundant logic.
2523
2524>>> def apply(operator_name, x, y):
2525 tags = (type_tag(x), type_tag(y))
2526 key = (operator_name, tags)
2527 return apply.implementations[key](x, y)
2528In this generic apply function, a key is constructed from the operator name (e.g., 'add') and a tuple of type tags for the arguments. Implementations are also populated using these tags. We enable support for multiplication on complex and rational numbers below.
2529
2530>>> def mul_complex_and_rational(z, r):
2531 return ComplexMA(z.magnitude * r.numer / r.denom, z.angle)
2532>>> mul_rationals_and_complex = lambda r, z: mul_complex_and_rational(z, r)
2533>>> apply.implementations = {('mul', ('com', 'com')): mul_complex,
2534 ('mul', ('com', 'rat')): mul_complex_and_rational,
2535 ('mul', ('rat', 'com')): mul_rationals_and_complex,
2536 ('mul', ('rat', 'rat')): mul_rationals}
2537We can also include the addition implementations from add to apply, using the dictionary update method.
2538
2539>>> adders = add.implementations.items()
2540>>> apply.implementations.update({('add', tags):fn for (tags, fn) in adders})
2541Now that apply supports 8 different implementations in a single table, we can use it to manipulate rational and complex numbers quite generically.
2542
2543>>> apply('add', ComplexRI(1.5, 0), Rational(3, 2))
2544ComplexRI(3.0, 0)
2545>>> apply('mul', Rational(1, 2), ComplexMA(10, 1))
2546ComplexMA(5.0, 1)
2547This data-directed approach does manage the complexity of cross-type operators, but it is cumbersome. With such a system, the cost of introducing a new type is not just writing methods for that type, but also the construction and installation of the functions that implement the cross-type operations. This burden can easily require much more code than is needed to define the operations on the type itself.
2548
2549While the techniques of dispatching on type and data-directed programming do create additive implementations of generic functions, they do not effectively separate implementation concerns; implementors of the individual numeric types need to take account of other types when writing cross-type operations. Combining rational numbers and complex numbers isn't strictly the domain of either type. Formulating coherent policies on the division of responsibility among types can be an overwhelming task in designing systems with many types and cross-type operations.
2550
2551Coercion. In the general situation of completely unrelated operations acting on completely unrelated types, implementing explicit cross-type operations, cumbersome though it may be, is the best that one can hope for. Fortunately, we can sometimes do better by taking advantage of additional structure that may be latent in our type system. Often the different data types are not completely independent, and there may be ways by which objects of one type may be viewed as being of another type. This process is called coercion. For example, if we are asked to arithmetically combine a rational number with a complex number, we can view the rational number as a complex number whose imaginary part is zero. By doing so, we transform the problem to that of combining two complex numbers, which can be handled in the ordinary way by add_complex and mul_complex.
2552
2553In general, we can implement this idea by designing coercion functions that transform an object of one type into an equivalent object of another type. Here is a typical coercion function, which transforms a rational number to a complex number with zero imaginary part:
2554
2555>>> def rational_to_complex(x):
2556 return ComplexRI(x.numer/x.denom, 0)
2557Now, we can define a dictionary of coercion functions. This dictionary could be extended as more numeric types are introduced.
2558
2559>>> coercions = {('rat', 'com'): rational_to_complex}
2560It is not generally possible to coerce an arbitrary data object of each type into all other types. For example, there is no way to coerce an arbitrary complex number to a rational number, so there will be no such conversion implementation in the coercions dictionary.
2561
2562Using the coercions dictionary, we can write a function called coerce_apply, which attempts to coerce arguments into values of the same type, and only then applies an operator. The implementations dictionary of coerce_apply does not include any cross-type operator implementations.
2563
2564>>> def coerce_apply(operator_name, x, y):
2565 tx, ty = type_tag(x), type_tag(y)
2566 if tx != ty:
2567 if (tx, ty) in coercions:
2568 tx, x = ty, coercions[(tx, ty)](x)
2569 elif (ty, tx) in coercions:
2570 ty, y = tx, coercions[(ty, tx)](y)
2571 else:
2572 return 'No coercion possible.'
2573 key = (operator_name, tx)
2574 return coerce_apply.implementations[key](x, y)
2575The implementations of coerce_apply require only one type tag, because they assume that both values share the same type tag. Hence, we require only four implementations to support generic arithmetic over complex and rational numbers.
2576
2577>>> coerce_apply.implementations = {('mul', 'com'): mul_complex,
2578 ('mul', 'rat'): mul_rationals,
2579 ('add', 'com'): add_complex,
2580 ('add', 'rat'): add_rationals}
2581With these implementations in place, coerce_apply can replace apply.
2582
2583>>> coerce_apply('add', ComplexRI(1.5, 0), Rational(3, 2))
2584ComplexRI(3.0, 0)
2585>>> coerce_apply('mul', Rational(1, 2), ComplexMA(10, 1))
2586ComplexMA(5.0, 1.0)
2587This coercion scheme has some advantages over the method of defining explicit cross-type operations. Although we still need to write coercion functions to relate the types, we need to write only one function for each pair of types rather than a different functions for each collection of types and each generic operation. What we are counting on here is the fact that the appropriate transformation between types depends only on the types themselves, not on the particular operation to be applied.
2588
2589Further advantages come from extending coercion. Some more sophisticated coercion schemes do not just try to coerce one type into another, but instead may try to coerce two different types each into a third common type. Consider a rhombus and a rectangle: neither is a special case of the other, but both can be viewed as quadrilaterals. Another extension to coercion is iterative coercion, in which one data type is coerced into another via intermediate types. Consider that an integer can be converted into a real number by first converting it into a rational number, then converting that rational number into a real number. Chaining coercion in this way can reduce the total number of coercion functions that are required by a program.
2590
2591Despite its advantages, coercion does have potential drawbacks. For one, coercion functions can lose information when they are applied. In our example, rational numbers are exact representations, but become approximations when they are converted to complex numbers.
2592
2593Some programming languages have automatic coercion systems built in. In fact, early versions of Python had a __coerce__ special method on objects. In the end, the complexity of the built-in coercion system did not justify its use, and so it was removed. Instead, particular operators apply coercion to their arguments as needed. Operators are implemented as method calls on user defined types using special methods like __add__ and __mul__. It is left up to you, the user, to decide whether to employ type dispatching, data-directed programming, message passing, or coercion in order to implement generic functions in your programs.