11.2 PITFALLS, RULES & TRICKS

 

 

INTRODUCTION

 

As in any programming language, there are also some pitfalls in Python that even experienced programmers have to deal with sometimes. You can deal with them too if you know them as a potential source of danger.

 

 

THE VARIABLE MEMORY MODEL OF PYTHON

 

In chapter 2.6the concept of variables was introduced by a visual representation, the so-called box metaphor. A variable declaration, for instance  a = 100, was considered to be a reservation of space in computer memory to hold the number 100,  simular to creating a drawer or a box and putting the number 100 inside. Similar to mathematics, the letter a stands for a variable identifier (name or place holder). However this simple idea is not entirely correct in Python, since all data, including numbers, are regarded as objects and an object does not only have a value, but also define "behavior" using functions (or methods). As an example number objects "knows" how to add values.
The following lines demonstrate this concept:

a = 100
a: 100
a.__add__
built-in method __add__ of int object at 0x2>

Sometimes we say that 2 is the address of the object, in Python called id (identifier).

id(a)
2

In Python, by assigning a value to a variable, you create a name a that points to (or refers) to an int object, that is stored at memory address 0x2. Therefore a is also called an "alias" (or on other programming languages a reference or pointer). This situation can be represented symbolically:

The difference to the box metaphor becomes clearly visible in the following assignment:

b = a
b: 100

Actually this statement does not create a second box and put the number 100 inside, but simply creates a second alias b that refers to the same object that holds 100, as shown here:

is(b)
2

The situation looks like this:

The awareness of this concept becomes most important when the data is not a simply number, but a structured data type (like a list) as shown in the following example.
First you define a list a with a simple content. Then you create a second list b by assigning b to a:

a = [1, 2, 3]
a: [1, 2, 3]
b = a
b: [1, 2, 3]

As you suspect correctly the situation can be represented as follows

If you modify now the list using the alias b

b.append(4)
b
[1, 2, 3, 4]

it's like

Therefore you also changed the list referenced by a (it is the same)!

a
a: [1, 2, 3, 4]

This mutual dependency of the two variables a and b causes many subtle programming errors. But there is no danger if you define a completely new variable b because now you create a new object and the two objects a and b are completely independent.

a = [1, 2, 3]
a: [1, 2, 3]
b = a
b = [1, 2, 3]
b: [1, 2, 3]

or symbolically:

Now if you modify the list b:

b.append(4)

the list a remains unchanged as you can easily check:

b
b: [1, 2, 3, 4]
a
a: [1, 2, 3]

For simple data types like numbers, this mutual dependency does not harm because if you modify a number, a new number is automatically created. Indeed after the new assignment of  b, a new object is created in memory. The alias that referred to value 100 after the second line b = a now refers to a new object 200 after the third line.

a = 100
a: 100
b = a
b: 100
b = 200
b: 200
a
100
 

The following example demonstrates once again that the assignment of list variable is dangerous since the content of the list is not copied into the new list.

myGarden = ["Rose", "Lotus"]
yourGarden = myGarden
yourGarden[0] = "Hibiskus"
myGarden
["Hibiskus", "Lotus"]
yourGarden
["Hibiskus", "Lotus"]

Rule 1a:
The copy operation using the equal sign is unproblematic for numbers, strings, bytes, and tuples. For other data types, it is usually wrong.

If you want to create an independent copy (also called a clone) for mutable data types, for example of lists, you have to either copy the elements explicitly into a new variable with your own code, or you can use the function deepcopy() from the module copy:

import copy
myGarden = ["Rose", "Lotus"]
yourGarden = copy.deepcopy(myGarden)
yourGarden[0] = "Hibiskus"
myGarden
["Rose", "Lotus"]
yourGarden
["Hibiskus", "Lotus"]

As a consequence we formulate the following rule:

Rule 1b:
The copy operation using the equal sign is is usually wrong, except when the data is immutual (see below).

Rule 1 is often ignored in the context of parameter passing. Each time you pass a function a mutable data type, the function can change the values without any problems.

def show(garden)
    print "garden:", garden
    garden[0] = "Hibiskus"
myGarden = ["Rose", "Lotus"]
show(myGarden)
myGarden
["Hibiskus", "Lotus"]

After calling show(), the values of the passed parameters changed! Just like in medicine, it is usually an unexpected and unwanted adverse effect or side effect.

Rule 2:
In good programming style, you should not change the parameters passed into a function to avoid side effects.

 

 

PACKING & UMPACKING

 

At first glance, tuples do not seem to differ significantly from lists. As a matter of fact, tuples are basically immutable lists and thus all list operations that do not change the list, are also applicable to tuples. However, there are special notations techniques with tuples using commas.

Round parentheses can be omitted in the generation of tuples:

t = 1, 2, 3
t
(1, 2, 3)

In this case, the comma is used as a syntax character to separate the elements from each other. This is called automatic packing. You can smartly make use of this to return multiple function values as tuples:

import math
def sqrt(x):
    y = math.sqrt(x)
    return y, -y
sqrt(4)
(2,0, -2,0)

You can also use the comma operator on the left side when defining variables in case a tuple is placed after the equal sign. In this case, one speaks of automatic unpacking:

import math
def sqrt(x):
    y = math.sqrt(x)
    return y, -y
y1, y2 = sqrt(2)
y1
1.41421356237330951
y2
-1.41421356237330951

Packing is convenient for defining multiple variables simultaneously:

a, b, c = 1, 2, 3
a
1
b
2
c
3

In the following, the three numbers are packed on the right side and unpacked again on the left side:

t = 1, 2, 3
a, b, c = t
a
1
b
2
c
3

By the way, unpacking also works with lists:

li = [1, 2, 3]
a, b, c = li
a
1
b
2
c
3

Using this syntax, two numbers can be permuted in an elegant way without requiring an auxiliary variable:

li = [1, 2, 3]
a, b = b, a
a
2
b
1

 

 

MUTUAL  AND IMMUTUAL DATA TYPES

 

To improve data security by undesirable side effects, Python relies on the concept of alterable (mutable) and  non-alterable (immutable) data types. The latter include the numbers, strings (str), and byte tuple. For example changing a letter in a string causes an error message:

s = "abject"
s: "abject"
s[0] = "o"
TypeError:can't assign to immutable object

To modify s, you need to redefine the whole string:

s = "abject"
s: "abject"
s = "object"
s: "object"

Now a new string object is created and the reference to the old is lost (the memory space is freed by an internal garbage colllector).

 

 

TWO-DIMENSIONAL LISTS, MATRICES

 

Matrices are constructed as arrays in many programming languages. The rows of the matrix are arrays and the matrix itself is an array of these row arrays. It is straightforward to use lists instead of arrays in Python. However, you have to pay close attention since lists do not behave like elementary data types (immutable), but rather like reference types. You already get into a known pitfall during the creation of the matrix. Without knowing it, you generate a 3x3 matrix with zeros in the Python console.

A = [[0] * 3] * 3
A
[[0, 0, 0],
   [0, 0, 0],
   [0, 0, 0]]

You now change the last value of the first row using an allocation. You are probably surprised to notice that all other rows change too.

A[0][2] = 1
A
[[0, 0, 1],
 [0, 0, 1],
 [0, 0, 1]]

What happened there? Thinking about it a little will start you off. Generating A could have also been done in two steps:

z = [0] * 3
A = [z] * 3
A
[[0, 0, 0],
   [0, 0, 0],
   [0, 0, 0]]

First, a list z is generated with three zeros. Then a list A is made with three times the same line reference. All refer to the same list! If you change one of them, the others will also be affected.

Rule 3:
Never use the list multiplication sign for nested lists.

To get around the pitfalls, you can use list comprehension. As you can test out, the matrix now behaves properly:

A = [[0 for x in range(3)] for y in range(3)]
A
[[0, 0, 0],
   [0, 0, 0],
   [0, 0, 0]]
A[0][2] = 1
A
[[0, 0, 1],
   [0, 0, 0],
   [0, 0, 0]]