Digest of Dive into Python 3
Chapter 1. Your First Python Program
# humansize.py
SUFFIXES = {1000: ['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'],
1024: ['KiB', 'MiB', 'GiB', 'TiB', 'PiB', 'EiB', 'ZiB', 'YiB']}
def approximate_size(size, a_kilobyte_is_1024_bytes=True):
'''Convert a file size to human-readable form.
Keyword arguments:
size -- file size in bytes
a_kilobyte_is_1024_bytes -- if True (default), use multiples of 1024
if False, use multiples of 1000
Returns: string
'''
if size < 0:
raise ValueError('number must be non-negative')
multiple = 1024 if a_kilobyte_is_1024_bytes else 1000
for suffix in SUFFIXES[multiple]:
size /= multiple
if size < multiple:
return '{0:.1f} {1}'.format(size, suffix)
raise ValueError('number too large')
if __name__ == '__main__':
print(approximate_size(1000000000000, False))
print(approximate_size(1000000000000))
- Python functions have no explicit
begin
orend
, and no curly braces to mark where the function code starts and stops. The only delimiter is a colon (:
) and the indentation of the code itself.- 缩进请用 spaces 不要用 tabs
- It doesn’t need to be 4 spaces, it just needs to be consistent.
- The first line that is not indented marks the end of the function.
- The
print()
function is built-in; you’ll never see an explicit declaration of it. You can just use it, anytime, anywhere. - The keyword
def
starts the function declaration. - Python functions do not specify the datatype of their return value; they don’t even specify whether or not they return a value.
- In fact, every Python function returns a value; if the function ever executes a
return
statement, it will return that value, otherwise it will returnNone
, the Python null value.)
- In fact, every Python function returns a value; if the function ever executes a
- In Python, variables are never explicitly typed. Python figures out what type a variable is and keeps track of it internally.
- Python allows function arguments to have default values; if the function is called without the argument, the argument gets its default value.
- Furthermore, arguments can be specified in any order by using named arguments.
- Once you have a single named argument, all arguments to the right of that need to be named arguments too.
- You can export this function to console by
from humansize import approximate_size
. - A
docstring
(the'''
part), if it exists, must be the first thing defined in a function (that is, on the next line after the function declaration).docstring
s are for people who are going to be using your code without needing or wanting to know how it works.docstring
s can be turned into actual documentation.- 类似于 JavaDoc,不同于一般的 comment
1.4. The Import
Search Path
Python looks in all the directories defined in sys.path
(in its order) when you try to import a module.
>>> import sys
# show sys.path
>>> sys.path
# add an entry in sys.path
>>> sys.path.insert(0, '/home/mark/diveintopython3/examples') # The effect lasts as long as Python is running.
Not all modules are stored as .py files. Some are built-in modules; they are actually baked right into Python itself. Built-in modules behave just like regular modules, but their Python source code is not available, because they are not written in Python! (Like Python itself, these built-in modules are written in C.)
1.5. Everything Is An Object
>>> import humansize
>>> print(humansize.approximate_size(4096, True))
4.0 KiB
# __doc__ is a built-in attribute for functions.
# its value is the docstring
>>> print(humansize.approximate_size.__doc__)
Convert a file size to human-readable form.
......
When you want to use functions defined in imported modules, you need to include the module name. So you can’t just say approximate_size
; it must be humansize.approximate_size
.
In Python, the definition of an object is looser.
- Some objects have neither attributes nor methods, but they could.
- Not all objects are subclassable.
- But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function.
You may have heard the term “first-class object” in other programming contexts.
- In Python, functions are first-class objects. You can pass a function as an argument to another function.
- Modules are first-class objects. You can pass an entire module as an argument to a function.
- Classes are first-class objects, and individual instances of a class are also first-class objects.
1.7. Exceptions
Unlike Java, Python functions don’t declare which exceptions they might raise. It’s up to you to determine what possible exceptions you need to catch.
If you know a line of code may raise an exception, you should handle the exception using a try-except
block.
try:
from lxml import etree
except ImportError:
import xml.etree.ElementTree as etree
By the end of this try-except
block, you have imported some module and named it etree
. Since both modules implement a common API, the rest of your code doesn’t need to keep checking which module got imported. And since the module that did get imported is always called etree
, the rest of your code doesn’t need to be littered with if statements to call differently-named modules.
1.10. Running Scripts
if __name__ == '__main__':
......
- Like C, Python uses
==
for comparison and=
for assignment. Unlike C, Python does not support in-line assignment, so there’s no chance of accidentally assigning the value you thought you were comparing.
So what makes this if statement special? Well, modules are objects, and all modules have a built-in attribute __name__
. A module’s __name__
depends on how you’re using the module. If you import the module, then __name__
is the module’s filename, without a directory path or file extension.
But you can also run the module directly as a standalone program, in which case __name__
will be a special default value, __main__
. Python will evaluate this if statement, find a true expression, and execute the if code block.
Chapter 2. Native Datatypes
2.1. Diving In
Python has many native datatypes:
Bytes
andbyte arrays
, e.g. a JPEG image file.Lists
are ordered sequences of values.Tuples
are ordered, immutable sequences of values.Sets
are unordered bags of values.Dictionaries
are unordered bags of key-value pairs.
2.2. Booleans
In certain places (like if statements), Python expects an expression to evaluate to a boolean value. These places are called boolean contexts. You can use virtually any expression in a boolean context, and Python will try to determine its truth value. Different datatypes have different rules about which values are true or false in a boolean context.
Due to some legacy issues left over from Python 2, booleans can be treated as numbers. True
is 1; False
is 0.
2.3. Numbers
Python supports both integers and floating point numbers. There’s no type declaration to distinguish them; Python tells them apart by the presence or absence of a decimal point.
>>> type(1)
<class int>
>>> isinstance(1, int)
True
>>> 1 + 1
2
>>> 1 + 1.0
2.0
>>> type(2.0)
<class float>
>>> float(2)
2.0
>>> int(2.0)
2
>>> int(2.5)
2
>>> int(-2.5)
-2
>>> 11 / 2
5.5
>>> 11 // 2
5
>>> −11 // 2
−6
>>> 11.0 // 2
5.0
>>> 11 ** 2
121
>>> 11 % 2
1
>>> import fractions
>>> x = fractions.Fraction(1, 3)
>>> x
Fraction(1, 3)
>>> x * 2
Fraction(2, 3)
>>> fractions.Fraction(6, 4)
Fraction(3, 2)
>>> import math
>>> math.pi
3.1415926535897931
>>> math.sin(math.pi / 2)
1.0
>>> math.tan(math.pi / 4)
0.99999999999999989
- Python 2 had separate types for
int
andlong
.
2.4. Lists
A better analogy would be to the ArrayList
class, which can hold arbitrary objects and can expand dynamically as new items are added.
>>> a_list = ['a', 'b', 'mpilgrim', 'z', 'example']
>>> a_list
['a', 'b', 'mpilgrim', 'z', 'example']
>>> a_list[0]
'a'
>>> a_list[-1]
'example'
>>> a_list[-3]
'mpilgrim'
>>> a_list[1:3]
['b', 'mpilgrim']
>>> a_list[1:-1]
['b', 'mpilgrim', 'z']
>>> a_list[0:3]
['a', 'b', 'mpilgrim']
>>> a_list[:3]
['a', 'b', 'mpilgrim']
>>> a_list[3:]
['z', 'example']
>>> a_list[:]
['a', 'b', 'mpilgrim', 'z', 'example']
>>> a_list = ['a']
>>> a_list = a_list + [2.0, 3]
>>> a_list
['a', 2.0, 3]
>>> a_list.append(True)
>>> a_list
['a', 2.0, 3, True]
>>> a_list.extend(['four', 'Ω'])
>>> a_list
['a', 2.0, 3, True, 'four', 'Ω']
>>> a_list.insert(0, 'Ω')
>>> a_list
['Ω', 'a', 2.0, 3, True, 'four', 'Ω']
>>> a_list = ['a', 'b', 'c']
>>> a_list.extend(['d', 'e', 'f'])
>>> a_list
['a', 'b', 'c', 'd', 'e', 'f']
>>> a_list.append(['g', 'h', 'i'])
>>> a_list
['a', 'b', 'c', 'd', 'e', 'f', ['g', 'h', 'i']]
>>> a_list = ['a', 'b', 'new', 'mpilgrim', 'new']
>>> a_list.count('new')
2
>>> 'new' in a_list
True
>>> 'c' in a_list
False
>>> a_list.index('mpilgrim')
3
>>> a_list.index('new')
2
>>> a_list = ['a', 'b', 'new', 'mpilgrim', 'new']
>>> a_list[1]
'b'
>>> del a_list[1]
>>> a_list
['a', 'new', 'mpilgrim', 'new']
>>> a_list.remove('new')
>>> a_list
['a', 'mpilgrim', 'new']
>>> a_list.remove('new')
>>> a_list
['a', 'mpilgrim']
>>> a_list = ['a', 'b', 'new', 'mpilgrim']
>>> a_list.pop()
'mpilgrim'
>>> a_list
['a', 'b', 'new']
>>> a_list.pop(1)
'b'
>>> a_list
['a', 'new']
if []: # false
......
if ['a']: # true
......
- If the negative index is confusing to you, think of it this way:
a_list[-n] == a_list[len(a_list) - n]
. So in this list,a_list[-3] == a_list[5 - 3] == a_list[2]
. - You can get a part of a list, called a “slice”, by specifying two indices. The return value is a new list containing all the items of the list.
a_list[:]
is shorthand for making a complete copy of a list.- The
+
operator concatenates lists to create a new list.- However, if memory is a concern, you should be aware that list concatenation creates a second list in memory.
- In this case, that new list is immediately assigned to the existing variable
a_list
. So this line of code is really a two-step process — concatenation then assignment — which can (temporarily) consume a lot of memory when you’re dealing with large lists.
- List items do not need to be unique.
- The
extend()
method takes a single argument, which is always a list, and adds each of the items of that list toa_list
. - On the other hand, the
append()
method takes a single argument, which can be any datatype. - The
index()
method finds the first occurrence of a value in the list. - The
remove()
method takes a value and removes the first occurrence of that value from the list. - When called without arguments, the
pop()
list method removes the last item in the list and returns the value it removed.- You can pop arbitrary items from a list. Just pass a positional index to the
pop()
method.
- You can pop arbitrary items from a list. Just pass a positional index to the
- Empty lists are
false
; all other lists aretrue
.
2.5. Tuples
A tuple is an immutable list. A tuple can not be changed in any way once it is created.
>>> a_tuple = ("a", "b", "mpilgrim", "z", "example")
>>> type((False))
<class bool>
>>> type((False,))
<class tuple>
>>> v = ('a', 2, True)
>>> (x, y, z) = v
>>> x
'a'
>>> y
2
>>> z
True
>>> (MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY) = range(7)
>>> MONDAY
0
>>> TUESDAY
1
- A tuple is defined in the same way as a list, except that the whole set of elements is enclosed in parentheses instead of square brackets.
- Tuples are faster than lists. If you’re defining a constant set of values and all you’re ever going to do with it is iterate through it, use a tuple instead of a list.
- Some tuples can be used as dictionary keys (specifically, tuples that contain immutable values like strings, numbers, and other tuples). Lists can never be used as dictionary keys, because lists are not immutable.
- Tuples can be converted into lists, and vice-versa. The built-in
tuple()
function takes a list and returns a tuple with the same elements, and thelist()
function takes a tuple and returns a list.- In effect,
tuple()
freezes a list, andlist()
thaws a tuple.
- In effect,
- An empty tuple is false. Any tuple with at least one item is true.
- To create a tuple of one item, you need a comma after the value. Without the comma, Python just assumes you have an extra pair of parentheses, which is harmless, but it doesn’t create a tuple.
- Here’s a cool programming shortcut: in Python, you can use a tuple to assign multiple values at once.
- Technically, the
range()
function returns an iterator, not a list or a tuple.
- Technically, the
2.6. Sets
A set is an unordered “bag” of unique values. A single set can contain values of any immutable datatype. Once you have two sets, you can do standard set operations like union, intersection, and set difference.
>>> a_set = {1}
>>> a_list = ['a', 'b', 'mpilgrim', True, False, 42]
>>> a_set = set(a_list)
>>> a_set
{'a', False, 'b', True, 'mpilgrim', 42}
>>> a_set = set()
>>> a_set
set()
>>> not_sure = {}
>>> type(not_sure)
<class dict>
>>> a_set = {1, 2}
>>> a_set.add(4)
>>> a_set
{1, 2, 4}
>>> a_set.add(1)
>>> a_set
{1, 2, 4}
>>> a_set = {1, 2, 3}
>>> a_set
{1, 2, 3}
>>> a_set.update({2, 4, 6})
>>> a_set
{1, 2, 3, 4, 6}
>>> a_set.update({3, 6, 9}, {1, 2, 3, 5, 8, 13})
>>> a_set
{1, 2, 3, 4, 5, 6, 8, 9, 13}
>>> a_set.update([10, 20, 30])
>>> a_set
{1, 2, 3, 4, 5, 6, 8, 9, 10, 13, 20, 30}
>>> a_set = {1, 3, 6, 10, 15, 21, 28, 36, 45}
>>> a_set
{1, 3, 36, 6, 10, 45, 15, 21, 28}
>>> a_set.discard(10)
>>> a_set
{1, 3, 36, 6, 45, 15, 21, 28}
>>> a_set.discard(10)
>>> a_set
{1, 3, 36, 6, 45, 15, 21, 28}
>>> a_set.remove(21)
>>> a_set
{1, 3, 36, 6, 45, 15, 28}
>>> a_set.remove(21)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 21
- Sets are actually implemented as classes.
- You can actually call the
update()
method with any number of arguments. - There are three ways to remove individual values from a set:
- If you call the
discard()
method with a value that doesn’t exist in the set, it does nothing. No error; it’s just a no-op. - If the value doesn’t exist in the set, the
remove()
method raises a KeyError exception. - Like lists, sets have a
pop()
method. However, since sets are unordered, there is no “last” value in a set, so there is no way to control which value gets removed. It is completely arbitrary.- Attempting to pop a value from an empty set will raise a
KeyError
exception.
- Attempting to pop a value from an empty set will raise a
- If you call the
- The
clear()
method removes all values from a set, leaving you with an empty set. This is equivalent toa_set = set()
- Any two sets that contain all the same values (even with difference orders) are considered equal.
- An empty set is false. Any set with at least one item is true.
2.7. Dictionaries
A dictionary is an unordered set of key-value pairs.
# Use {} just like Sets
>>> a_dict = {'server': 'db.diveintopython3.org', 'database': 'mysql'}
>>> a_dict
{'server': 'db.diveintopython3.org', 'database': 'mysql'}
>>> a_dict['server']
'db.diveintopython3.org'
>>> a_dict['database']
'mysql'
# modify the value by an existing entry
>>> a_dict['database'] = 'blog'
# add a new entry
>>> a_dict['user'] = 'mark'
- An empty dictionary is false. Any dictionary with at least one key-value pair is true.
2.8. None
None
is a special constant in Python.
- It is a null value.
None
is not the same asFalse
.- In a boolean context,
None
is evaluated as false andnot None
as true.
- In a boolean context,
None
is not 0.None
is not an empty string.- Comparing
None
to anything other thanNone
will always returnFalse
. - It has its own datatype (
NoneType
). - You can assign
None
to any variable, but you can not create otherNoneType
objects. - All variables whose value is
None
are equal to each other.
Chapter 3. Comprehensions
3.2. Workingwith Files And Directories
Python 3 comes with a module called os
, which stands for “operating system.” The os module contains a plethora of functions to get information on — and in some cases, to manipulate — local directories, files, processes, and environment variables. Python does its best to offer a unified API across all supported operating systems so your programs can run on any computer with as little platform-specific code as possible.
>>> import os
# 相当于 pwd 或者 cd (cd 不带参数时就相当于 pwd)
>>> print(os.getcwd())
C:\Python31
# 相当于 cd 到某个目录
>>> os.chdir('/Users/pilgrim/diveintopython3/examples')
>>> print(os.getcwd())
C:\Users\pilgrim\diveintopython3\examples
# 获取 absolute path
>>> print(os.path.realpath('feed.xml'))
c:\Users\pilgrim\diveintopython3\examples\feed.xml
# 路径拼接
>>> print(os.path.join('/Users/pilgrim/diveintopython3/examples/', 'humansize.py'))
/Users/pilgrim/diveintopython3/examples/humansize.py
>>> print(os.path.join('/Users/pilgrim/diveintopython3/examples', 'humansize.py'))
/Users/pilgrim/diveintopython3/examples\humansize.py
>>> print(os.path.expanduser('~'))
c:\Users\pilgrim
>>> print(os.path.join(os.path.expanduser('~'), 'diveintopython3', 'examples', 'humansize.py'))
c:\Users\pilgrim\diveintopython3\examples\humansize.py
# 路径拆分
>>> pathname = '/Users/pilgrim/diveintopython3/examples/humansize.py'
>>> os.path.split(pathname)
('/Users/pilgrim/diveintopython3/examples', 'humansize.py')
>>> (dirname, filename) = os.path.split(pathname)
>>> dirname
'/Users/pilgrim/diveintopython3/examples'
>>> filename
'humansize.py'
# 文件名与 extension 拆分
>>> (shortname, extension) = os.path.splitext(filename)
>>> shortname
'humansize'
>>> extension
'.py'
- The
os.path.expanduser()
function will expand a pathname that uses ~ to represent the current user’s home directory.
The glob
module is another tool in the Python standard library. It’s an easy way to get the contents of a directory programmatically, and it uses the sort of wildcards that you may already be familiar with from working on the command line.
- glob: (programming) A limited pattern matching technique using wildcards, less powerful than a regular expression.
>>> import glob
>>> os.chdir('/Users/pilgrim/diveintopython3/')
>>> glob.glob('examples/*.xml')
['examples\\feed-broken.xml',
'examples\\feed-ns0.xml',
'examples\\feed.xml']
>>> os.chdir('examples/')
>>> glob.glob('*test*.py')
['alphameticstest.py',
'pluraltest1.py',
'pluraltest2.py',
'pluraltest3.py',
'pluraltest4.py',
'pluraltest5.py',
'pluraltest6.py',
'romantest1.py',
'romantest10.py',
'romantest2.py',
'romantest3.py',
'romantest4.py',
'romantest5.py',
'romantest6.py',
'romantest7.py',
'romantest8.py',
'romantest9.py']
Every modern file system stores metadata about each file: creation date, last-modified date, file size, and so on. Python provides a single API to access this metadata.
>>> import os
>>> print(os.getcwd())
c:\Users\pilgrim\diveintopython3\examples
>>> metadata = os.stat('feed.xml')
>>> metadata.st_mtime
1247520344.9537716
>>> import time
>>> time.localtime(metadata.st_mtime)
time.struct_time(tm_year=2009, tm_mon=7, tm_mday=13, tm_hour=17,
tm_min=25, tm_sec=44, tm_wday=0, tm_yday=194, tm_isdst=1)
- Calling the
os.stat()
function returns an object that contains several different types of metadata about the file.
3.3. List Comprehensions
A list comprehension provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.
>>> a_list = [1, 9, 8, 4]
>>> [elem * 2 for elem in a_list]
[2, 18, 16, 8]
# a_list 本身并没有变
>>> a_list
[1, 9, 8, 4]
>>> a_list = [elem * 2 for elem in a_list]
>>> a_list
[2, 18, 16, 8]
- A list comprehension creates a new list; it does not change the original list.
>>> import os, glob
>>> glob.glob('*.xml')
['feed-broken.xml', 'feed-ns0.xml', 'feed.xml']
>>> [os.path.realpath(f) for f in glob.glob('*.xml')]
['c:\\Users\\pilgrim\\diveintopython3\\examples\\feed-broken.xml',
'c:\\Users\\pilgrim\\diveintopython3\\examples\\feed-ns0.xml',
'c:\\Users\\pilgrim\\diveintopython3\\examples\\feed.xml']
List comprehensions can also filter items, producing a result that can be smaller than the original list.
>>> import os, glob
>>> [f for f in glob.glob('*.py') if os.stat(f).st_size > 6000]
['pluraltest6.py',
'romantest10.py',
'romantest6.py',
'romantest7.py',
'romantest8.py',
'romantest9.py']
There’s no limit to how complex a list comprehension can be.
>>> import os, glob
>>> [(os.stat(f).st_size, os.path.realpath(f)) for f in glob.glob('*.xml')]
[(3074, 'c:\\Users\\pilgrim\\diveintopython3\\examples\\feed-broken.xml'),
(3386, 'c:\\Users\\pilgrim\\diveintopython3\\examples\\feed-ns0.xml'),
(3070, 'c:\\Users\\pilgrim\\diveintopython3\\examples\\feed.xml')]
>>> import humansize
>>> [(humansize.approximate_size(os.stat(f).st_size), f) for f in glob.glob('*.xml')]
[('3.0 KiB', 'feed-broken.xml'),
('3.3 KiB', 'feed-ns0.xml'),
('3.0 KiB', 'feed.xml')]
3.4. Dictionary Comprehensions
>>> import os, glob
# This is a list comprehension
>>> metadata = [(f, os.stat(f)) for f in glob.glob('*test*.py')]
>>> metadata[0]
('alphameticstest.py', nt.stat_result(st_mode=33206, st_ino=0, st_dev=0,
st_nlink=0, st_uid=0, st_gid=0, st_size=2509, st_atime=1247520344,
st_mtime=1247520344, st_ctime=1247520344))
# This is a dictionary comprehension
>>> metadata_dict = {f:os.stat(f) for f in glob.glob('*test*.py')}
>>> type(metadata_dict)
<class 'dict'>
>>> list(metadata_dict.keys())
['romantest8.py', 'pluraltest1.py', 'pluraltest2.py', 'pluraltest5.py',
'pluraltest6.py', 'romantest7.py', 'romantest10.py', 'romantest4.py',
'romantest9.py', 'pluraltest3.py', 'romantest1.py', 'romantest2.py',
'romantest3.py', 'romantest5.py', 'romantest6.py', 'alphameticstest.py',
'pluraltest4.py']
>>> metadata_dict['alphameticstest.py'].st_size
2509
- The syntax is similar to a list comprehension, with two differences.
- First, it is enclosed in curly braces
{}
instead of square brackets[]
. - Second, instead of a single expression for each item, it contains two expressions separated by a colon
:
.- The expression before the colon (
f
in this example) is the dictionary key; - the expression after the colon (
os.stat(f)
in this example) is the value.
- The expression before the colon (
- First, it is enclosed in curly braces
Here’s a trick with dictionary comprehensions that might be useful someday: swapping the keys and values of a dictionary.
>>> a_dict = {'a': 1, 'b': 2, 'c': 3}
>>> {value:key for key, value in a_dict.items()}
{1: 'a', 2: 'b', 3: 'c'}
Of course, this only works if the values of the dictionary are immutable, like strings or tuples. If you try this with a dictionary that contains lists, it will fail most spectacularly.
3.5. Set Comprehensions
>>> a_set = set(range(10))
>>> a_set
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> {x ** 2 for x in a_set}
{0, 1, 4, 81, 64, 9, 16, 49, 25, 36}
>>> {x for x in a_set if x % 2 == 0}
{0, 8, 2, 4, 6}
>>> {2**x for x in range(10)}
{32, 1, 2, 4, 8, 64, 128, 256, 16, 512}
Chapter 4. Strings
4.1. Some Boring Stuff You Need To Understand Before You Can Dive In
科普向。写得很好。
4.2. Unicode
科普向。写得很好。
4.3. Diving In
In Python 3, all strings are sequences of Unicode characters. There is no such thing as a Python string encoded in UTF-8, or a Python string encoded as CP-1252. “Is this string UTF-8 ?” is an invalid question. UTF-8 is a way of encoding characters as a sequence of bytes. Bytes are not characters; bytes are bytes. Characters are an abstraction. A string is a sequence of those abstractions.
To create a string, enclose it in quotes. Python strings can be defined with either single quotes ('
) or double quotes ("
).
>>> s = '深入 Python'
>>> len(s)
9
>>> s[0]
'深'
>>> s + ' 3'
'深入 Python 3'
4.4. Formatting Strings
Let’s take another look at humansize.py:
......
if size < multiple:
return '{0:.1f} {1}'.format(size, suffix)
......
Python 3 supports formatting values into strings. Although this can include very complicated expressions, the most basic usage is to insert a value into a string with a single placeholder.
>>> username = 'mark'
>>> password = 'PapayaWhip'
>>> "{0}'s password is {1}".format(username, password)
"mark's password is PapayaWhip"
- First, that’s a method call on a string literal. Strings are objects, and objects have methods.
- Second, the whole expression evaluates to a string.
- Third,
{0}
and{1}
are replacement fields, which are replaced by the arguments passed to theformat()
method.
4.4.1. Compound Field Names
The previous example shows the simplest case, where the replacement fields are simply integers. Integer replacement fields are treated as positional indices into the argument list of the format()
method. That means that {0}
is replaced by the first argument (username
in this case), {1}
is replaced by the second argument (password
), &c. But replacement fields are much more powerful than that.
>>> import humansize
>>> si_suffixes = humansize.SUFFIXES[1000]
>>> si_suffixes
['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB']
>>> '1000{0[0]} = 1{0[1]}'.format(si_suffixes)
'1000KB = 1MB'
{0}
would refer to the first argument passed to theformat()
method,si_suffixes
.- But
si_suffixes
is a list. So{0[0]}
refers to the first item of the list which is the first argument passed to theformat()
method: ‘KB’. - Meanwhile,
{0[1]}
refers to the second item of the same list: ‘MB’.
What this example shows is that format specifiers can access items and properties of data structures using (almost) Python syntax. This is called compound field names. The following compound field names “just work”:
- Passing a list, and accessing an item of the list by index (as in the previous example)
- Passing a dictionary, and accessing a value of the dictionary by key
- Passing a module, and accessing its variables and functions by name
- Passing a class instance, and accessing its properties and methods by name
- Any combination of the above
Just to blow your mind, here’s an example that combines all of the above:
>>> import humansize
>>> import sys
>>> '1MB = 1000{0.modules[humansize].SUFFIXES[1000][0]}'.format(sys)
'1MB = 1000KB'
- The
sys
module holds information about the currently running Python instance. Since you just imported it, you can pass thesys
module itself as an argument to theformat()
method. So the replacement field{0}
refers to thesys
module. sys.modules
is a dictionary of all the modules that have been imported in this Python instance. The keys are the module names as strings; the values are the module objects themselves. So the replacement field{0.modules}
refers to the dictionary of imported modules.sys.modules['humansize']
is thehumansize
module which you just imported. The replacement field{0.modules[humansize]}
refers to the humansize module. Note the slight difference in syntax here. In real Python code, the keys of thesys.modules
dictionary are strings; to refer to them, you need to put quotes around the module name (e.g. ‘humansize’). But within a replacement field, you skip the quotes around the dictionary key name (e.g. humansize).- To quote PEP 3101: Advanced String Formatting, “The rules for parsing an item key are very simple. If it starts with a digit, then it is treated as a number, otherwise it is used as a string.”
sys.modules['humansize'].SUFFIXES
is the dictionary defined at the top of thehumansize
module. The replacement field{0.modules[humansize].SUFFIXES}
refers to that dictionary.sys.modules['humansize'].SUFFIXES[1000]
is a list ofSI
suffixes:['KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB']
. So the replacement field{0.modules[humansize].SUFFIXES[1000]}
refers to that list.sys.modules['humansize'].SUFFIXES[1000][0]
is the first item of the list of SI suffixes: ‘KB’. Therefore, the complete replacement field{0.modules[humansize].SUFFIXES[1000][0]}
is replaced by the two-character string ‘KB’.
4.4.2. Format Specifiers
But what is {0:.1f}
? It’s two things: {0}
, which you recognize, and :.1f
, which defines the format specifier.
Within a replacement field, a colon (:
) marks the start of the format specifier. The format specifier .1
means “round to the nearest tenth” (i.e. display only one digit after the decimal point). The format specifier f
means “fixed-point number” (as opposed to exponential notation or some other decimal representation).
4.5. Other Common String Methods
Let’s say you have a list of key-value pairs and you want to split them up and make a dictionary:
>>> query = 'user=pilgrim&database=master&password=PapayaWhip'
>>> a_list = query.split('&')
>>> a_list
['user=pilgrim', 'database=master', 'password=PapayaWhip']
>>> a_list_of_lists = [v.split('=', 1) for v in a_list if '=' in v]
>>> a_list_of_lists
[['user', 'pilgrim'], ['database', 'master'], ['password', 'PapayaWhip']]
>>> a_dict = dict(a_list_of_lists)
>>> a_dict
{'password': 'PapayaWhip', 'user': 'pilgrim', 'database': 'master'}
4.6. Strings vs. Bytes
Bytes are bytes; characters are an abstraction. An immutable sequence of Unicode characters is called a string
. An immutable sequence of numbers-between-0-and-255 is called a bytes
object.
>>> by = b'abcd\x65'
>>> by
b'abcde'
>>> type(by)
<class bytes>
>>> by += b'\xff'
>>> by # \xff 没有对应到一个 character,所以仍然显示为 \xff
b'abcde\xff'
>>> by[5]
255
A bytes object is immutable; you can not assign individual bytes. If you need to change individual bytes, you can convert the bytes object into a bytearray
object. The assigned value must be an integer between 0–255.
>>> by = b'abcd\x65'
>>> barr = bytearray(by)
>>> barr
bytearray(b'abcde')
>>> barr[0] = 102
>>> barr
bytearray(b'fbcde')
The one thing you can never do is mix bytes and strings.
>>> by = b'd'
>>> s = 'abcde'
>>> by + s # ERROR
>>> s.count(by) # ERROR
>>> s.count(by.decode('ascii')) # OK
1
And here is the link between string
s and bytes
: bytes
objects have a decode()
method that takes a character encoding and returns a string
, and string
s have an encode()
method that takes a character encoding and returns a bytes
object.
4.7. Postscript: Character Encoding Of Python Source Code
In Python 2, the default encoding for .py files was ASCII. In Python 3, the default encoding is UTF-8.
If you would like to use a different encoding within your Python code, you can put an encoding declaration on the first line of each file. This declaration below defines a .py file to be windows-1252:
# -*- coding: windows-1252 -*-
Technically, the character encoding override can also be on the second line, if the first line is a UNIX-like hash-bang command.
#!/usr/bin/python3
# -*- coding: windows-1252 -*-
Chapter 5. Regular Expressions (略)
Chapter 6. Closures & Generators
6.1. Diving In
first, let’s talk about how to make plural nouns. If you grew up in an English-speaking country or learned English in a formal school setting, you’re probably familiar with the basic rules:
- If a word ends in S, X, or Z, add ES. Bass becomes basses, fax becomes faxes, and waltz becomes waltzes.
- If a word ends in a noisy H, add ES; if it ends in a silent H, just add S.
- What’s a noisy H? One that gets combined with other letters to make a sound that you can hear. So coach becomes coaches and rash becomes rashes, because you can hear the CH and SH sounds when you say them.
- But cheetah becomes cheetahs, because the H is silent.
- If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else, just add S. - So vacancy becomes vacancies, but day becomes days.
- If all else fails, just add S and hope for the best.
Let’s design a Python library that automatically pluralizes English nouns. We’ll start with just these four rules.
6.2. I Know, Let’s Use Regular Expressions!
def plural(noun):
if re.search('[sxz]$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeioudgkprt]h$', noun):
return re.sub('$', 'es', noun)
elif re.search('[^aeiou]y$', noun):
return re.sub('y$', 'ies', noun)
else:
return noun + 's'
6.3. A List Of Functions
def match_sxz(noun):
return re.search('[sxz]$', noun)
def apply_sxz(noun):
return re.sub('$', 'es', noun)
def match_h(noun):
return re.search('[^aeioudgkprt]h$', noun)
def apply_h(noun):
return re.sub('$', 'es', noun)
def match_y(noun):
return re.search('[^aeiou]y$', noun)
def apply_y(noun):
return re.sub('y$', 'ies', noun)
def match_default(noun):
return True
def apply_default(noun):
return noun + 's'
rules = ((match_sxz, apply_sxz),
(match_h, apply_h),
(match_y, apply_y),
(match_default, apply_default)
)
def plural(noun):
for matches_rule, apply_rule in rules:
if matches_rule(noun):
return apply_rule(noun)
The reason this technique works is that everything in Python is an object, including functions. The rules data structure contains functions — not names of functions, but actual function objects. When they get assigned in the for
loop, then matches_rule
and apply_rule
are actual functions that you can call.
6.4. A List Of Patterns
Defining separate named functions for each match and apply rule isn’t really necessary. You never call them directly; you add them to the rules
sequence and call them through there. Furthermore, each function follows one of two patterns. All the match functions call re.search()
, and all the apply functions call re.sub()
. Let’s factor out the patterns so that defining new rules can be easier.
import re
def build_match_and_apply_functions(pattern, search, replace):
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
return re.sub(search, replace, word)
return (matches_rule, apply_rule)
- This technique of using the values of outside parameters within a dynamic function is called closures. You’re essentially defining constants within the
apply_rule
function you’re building: it takes one parameter (word
), but it then acts on that plus two other values (search
andreplace
) which were set when you defined the apply function. - Finally, the
build_match_and_apply_functions()
function returns a tuple of two values: the two functions you just created. The constants you defined within those functions (pattern
within thematches_rule()
function, andsearch
andreplace
within theapply_rule()
function) stay with those functions, even after you return frombuild_match_and_apply_functions()
.
If this is incredibly confusing (and it should be, this is weird stuff), it may become clearer when you see how to use it.
patterns = \
(
('[sxz]$', '$', 'es'),
('[^aeioudgkprt]h$', '$', 'es'),
('(qu|[^aeiou])y$', 'y$', 'ies'),
('$', '$', 's')
)
rules = [build_match_and_apply_functions(pattern, search, replace) for (pattern, search, replace) in patterns]
- There’s a slight change here, in the fallback rule. In the previous example, the
match_default()
function simply returnedTrue
, meaning that if none of the more specific rules matched, the code would simply add an s to the end of the given word. This example does something functionally equivalent. The final regular expression asks whether the word has an end ($
matches the end of a string). Of course, every string has an end, even an empty string, so this expression always matches. Thus, it serves the same purpose as thematch_default()
function that always returnedTrue
def plural(noun):
for matches_rule, apply_rule in rules:
if matches_rule(noun):
return apply_rule(noun)
6.5. A File Of Patterns
First, let’s create a text file that contains the rules you want. No fancy data structures, just whitespace-delimited strings in three columns. Let’s call it plural4-rules.txt
.
[sxz]$ $ es
[^aeioudgkprt]h$ $ es
[^aeiou]y$ y$ ies
$ $ s
Now let’s see how you can use this rules file.
import re
def build_match_and_apply_functions(pattern, search, replace):
def matches_rule(word):
return re.search(pattern, word)
def apply_rule(word):
return re.sub(search, replace, word)
return (matches_rule, apply_rule)
rules = [] # initialized as empty
with open('plural4-rules.txt', encoding='utf-8') as pattern_file:
for line in pattern_file:
pattern, search, replace = line.split(None, 3)
rules.append(build_match_and_apply_functions(pattern, search, replace))
- The
with
statement creates what’s called a context: when the with block ends, Python will automatically close the file, even if an exception is raised inside the with block. - The
for line in <fileobject>
idiom reads data from the open file, one line at a time. - The first argument to the
split()
method isNone
, which means “split on any whitespace (tabs or spaces, it makes no difference).” The second argument is3
, which means “split on whitespace 3 times, then leave the rest of the line alone.”
The improvement here is that you’ve completely separated the pluralization rules into an external file, so it can be maintained separately from the code that uses it. Code is code, data is data, and life is good.
6.6. Generators
Let’s look at an interactive example first.
>>> def make_counter(x):
... print('entering make_counter')
... while True:
... yield x
... print('incrementing x')
... x = x + 1
...
>>> counter = make_counter(2)
>>> counter
<generator object at 0x001C9C10>
>>> next(counter)
entering make_counter
2
>>> next(counter)
incrementing x
3
>>> next(counter)
incrementing x
4
- The presence of the yield keyword in
make_counter
means that this is not a normal function. It is a special kind of function which generates values one at a time. Calling it will return a generator object that can be used to generate successive values ofx
(Note that this does not actually execute the function code). - The
next()
function takes a generator object and returns its next value.- The first time you call
next()
with the counter generator, it executes the code inmake_counter()
up to the firstyield
statement, then returns the value that was yielded. - Repeatedly calling
next()
with the same generator object resumes exactly where it left off and continues the loop until it hits the nextyield
statement. All variables, local state, &c. are saved onyield
and restored onnext()
. - 简单说就是:
yield
pauses a function;next()
resumes where it left off.
- The first time you call
Since make_counter
sets up an infinite loop, you could theoretically do this forever, and it would just keep incrementing x
and spitting out values. But let’s look at more productive uses of generators instead.
6.6.1. A Fibonacci Generator <
def fib(max):
a, b = 0, 1
while a < max:
yield a
a, b = b, a + b
>>> from fibonacci import fib
>>> for n in fib(1000):
... print(n, end=' ')
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> list(fib(1000))
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]
6.6.2. A Plural Rule Generator
def rules(rules_filename):
with open(rules_filename, encoding='utf-8') as pattern_file:
for line in pattern_file:
pattern, search, replace = line.split(None, 3)
yield build_match_and_apply_functions(pattern, search, replace)
def plural(noun, rules_filename='plural5-rules.txt'):
for matches_rule, apply_rule in rules(rules_filename):
if matches_rule(noun):
return apply_rule(noun)
raise ValueError('no matching rule for {0}'.format(noun))
What have you gained over stage 4? Startup time. In stage 4, when you imported the plural4
module, it read the entire patterns file and built a list of all the possible rules, before you could even think about calling the plural()
function. With generators, you can do everything lazily: you read the first rule and create functions and try them, and if that works you don’t ever read the rest of the file or create any other functions.
What have you lost? Performance! Every time you call the plural()
function, the rules()
generator starts over from the beginning — which means re-opening the patterns file and reading from the beginning, one line at a time.
What if you could have the best of both worlds: minimal startup cost (don’t execute any code on import) (import 后不立即 parse rules) and maximum performance (don’t build the same functions over and over again). Oh, and you still want to keep the rules in a separate file (because code is code and data is data), just as long as you never have to read the same line twice. To do that, you’ll need to build your own iterator.
Chapter 7. Classes & Iterators
7.1. Diving In
Comprehensions are just a simple form of iterators. Generators are just a simple form of iterators. A function that yield
s values is a nice, compact way of building an iterator without building an iterator. Let me show you what I mean by that.
Remember the Fibonacci generator? Here it is as a built-from-scratch iterator:
class Fib:
'''iterator that yields numbers in the Fibonacci sequence'''
def __init__(self, max):
self.max = max
def __iter__(self):
self.a = 0
self.b = 1
return self
def __next__(self):
fib = self.a
if fib > self.max:
raise StopIteration
self.a, self.b = self.b, self.a + self.b
return fib
Let’s take that one line at a time. class Fib:
. Then what’s a class?
7.2. Defining Classes
Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you’ve defined.
class PapayaWhip:
pass
This PapayaWhip
class doesn’t define any methods or attributes, but syntactically (of or relating to syntax), there needs to be something in the definition, thus the pass
statement. This is a Python reserved word that just means “move along, nothing to see here”. It’s a statement that does nothing, and it’s a good placeholder when you’re stubbing out functions or classes.
- The pass statement in Python is like a empty set of curly braces (
{}
) in Java or C.
The __init__()
method is called immediately after an instance of the class is created. It would be tempting — but technically incorrect — to call this the “constructor” of the class. It’s tempting, because it looks like a C++ constructor (by convention, the __init__()
method is the first method defined for the class), acts like one (it’s the first piece of code executed in a newly created instance of the class), and even sounds like one. Incorrect, because the object has already been constructed by the time the __init__()
method is called, and you already have a valid reference to the new instance of the class.
The first argument of every class method, including the __init__()
method, is always a reference to the current instance of the class. By convention, this argument is named self
. This argument fills the role of the reserved word this
in C++ or Java, but self
is NOT a reserved word in Python, merely a naming convention. Nonetheless, please don’t call it anything but self
; this is a very strong convention.
In the __init__()
method, self
refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specify self explicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically.
7.3. Instantiating Classes
Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the __init__()
method requires. The return value will be the newly created object.
Every class instance has a built-in attribute, __class__
, which is the object’s class. Java programmers may be familiar with the Class
class, which contains methods like getName()
and getSuperclass()
to get metadata information about an object. In Python, this kind of metadata is available through attributes, but the idea is the same.
You can access the instance’s docstring
just as with a function or a module, by the attribute __doc__
. All instances of a class share the same docstring
.
7.4. Instance Variables
I.e. self.max
, like this.foo
in Java.
7.5. A Fibonacci Iterator
All three of these class methods, __init__()
, __iter__()
, and __next__(0
), begin and end with a pair of underscore (_
) characters. Why is that? There’s nothing magical about it, but it usually indicates that these are “special methods.” The only thing “special” about special methods is that they aren’t called directly; Python calls them when you use some other syntax on the class or an instance of the class.
“Calling” Fib(max)
is really creating an instance of this class and calling its __init__()
method with max
. The __init__()
method saves the maximum value as an instance variable so other methods can refer to it later.
The __iter__()
method is called whenever someone calls iter(fib)
. (As you’ll see in a minute, a for loop will call this automatically, but you can also call it yourself manually.) After performing beginning-of-iteration initialization (in this case, resetting self.a
and self.b
, our two counters), the __iter__()
method can return any object that implements a __next__()
method. In this case (and in most cases), __iter__()
simply returns self
, which signals that this class has a __next__()
method.
The __next__()
method is called whenever someone calls next()
on an iterator of an instance of a class. When the __next__()
method raises a StopIteration
exception, this signals to the caller that the iteration is exhausted. Unlike most exceptions, this is not an error; it’s a normal condition that just means that the iterator has no more values to generate. If the caller is a for
loop, it will notice this StopIteration
exception and gracefully exit the loop. (In other words, it will swallow the exception.) This little bit of magic is actually the key to using iterators in for
loops.
To spit out the next value, an iterator’s __next__()
method simply returns the value. Do not use yield
here; that’s a bit of syntactic sugar that only applies when you’re using generators. Here you’re creating your own iterator from scratch; use return
instead.
Let’s see how to call this iterator:
>>> from fibonacci2 import Fib
>>> for n in Fib(1000):
... print(n, end=' ')
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Here’s what happens:
- The for loop calls
Fib(1000)
, as shown. This returns an instance of theFib
class. Call thisfib_inst
. - Secretly, and quite cleverly, the for loop calls iter(
fib_inst
), which returns an iterator object. Call thisfib_iter
. In this case,fib_iter == fib_inst
, because the__iter__()
method returnsself
, but thefor
loop doesn’t know (or care) about that. - To “loop through” the iterator, the
for
loop callsnext(fib_iter)
, which calls the__next__()
method on thefib_iter
object, which does the next-Fibonacci-number calculations and returns a value. Thefor
loop takes this value and assigns it ton
, then executes the body of thefor
loop for that value ofn
.
7.6. A Plural Rule Iterator
class LazyRules:
rules_filename = 'plural6-rules.txt'
def __init__(self):
self.pattern_file = open(self.rules_filename, encoding='utf-8')
self.cache = []
def __iter__(self):
self.cache_index = 0
return self
def __next__(self):
self.cache_index += 1
if len(self.cache) >= self.cache_index:
return self.cache[self.cache_index - 1]
if self.pattern_file.closed:
raise StopIteration
line = self.pattern_file.readline()
if not line:
self.pattern_file.close()
raise StopIteration
pattern, search, replace = line.split(None, 3)
funcs = build_match_and_apply_functions(pattern, search, replace)
self.cache.append(funcs)
return funcs
rules = LazyRules()
def plural(noun):
for matches_rule, apply_rule in rules:
if matches_rule(noun):
return apply_rule(noun)
raise ValueError('no matching rule for {0}'.format(noun))
So this is a class that implements __iter__()
and __next__()
, so it can be used as an iterator. Then, you instantiate the class and assign it to rules. This happens just once, on import.
Putting it all together, here’s what happens when:
- When the module is imported, it creates a single instance of the
LazyRules
class, calledrules
, which opens the pattern file but does not read from it. - When asked for the first match and apply function, it checks its cache but finds the cache is empty. So it reads a single line from the pattern file, builds the
match
andapply
functions from those patterns, and caches them. - Let’s say, for the sake of argument, that the very first rule matched. If so, no further
match
andapply
functions are built, and no further lines are read from the pattern file. - Furthermore, for the sake of argument, suppose that the caller calls the
plural()
function again to pluralize a different word. Thefor
loop in theplural()
function will calliter(rules)
, which will reset the cache index but will not reset the open file object. - The first time through, the
for
loop will ask for a value from rules, which will invoke its__next__()
method. This time, however, the cache is primed with a single pair ofmatch
andapply
functions, corresponding to the patterns in the first line of the pattern file. Since they were built and cached in the course of pluralizing the previous word, they’re retrieved from the cache. The cache index increments, and the open file is never touched. - Let’s say, for the sake of argument, that the first rule does not match this time around. So the
for
loop comes around again and asks for another value from rules. This invokes the__next__()
method a second time. This time, the cache is exhausted — it only contained one item, and we’re asking for a second — so the__next__()
method continues. It reads another line from the open file, builds match and apply functions out of the patterns, and caches them. - This read-build-and-cache process will continue as long as the rules being read from the pattern file don’t match the word we’re trying to pluralize. If we do find a matching rule before the end of the file, we simply use it and stop, with the file still open. The file pointer will stay wherever we stopped reading, waiting for the next
readline()
command. In the meantime, the cache now has more items in it, and if we start all over again trying to pluralize a new word, each of those items in the cache will be tried before reading the next line from the pattern file.
书上还讨论了这种写法的优势(Minimal startup cost / Maximum performance / Separation of code and data)和劣势(主要是 opened file 一直悬在那里),值得一看。
Chapter 8. Advanced Iterators
8.3. Finding The Unique Items In A Sequence
>>> a_list = ['The', 'sixth', 'sick', "sheik's", 'sixth', "sheep's", 'sick']
>>> set(a_list)
{'sixth', 'The', "sheep's", 'sick', "sheik's"}
>>> a_string = 'EAST IS EAST'
>>> set(a_string)
{'A', ' ', 'E', 'I', 'S', 'T'}
>>> words = ['SEND', 'MORE', 'MONEY']
>>> ''.join(words)
'SENDMOREMONEY'
>>> set(''.join(words))
{'E', 'D', 'M', 'O', 'N', 'S', 'R', 'Y'}
8.4. Making Assertions
>>> assert 1 + 1 == 2
>>> assert 1 + 1 == 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
>>> assert 2 + 2 == 5, "Only for very large values of 2"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: Only for very large values of 2
Therefore, this line of code:
assert len(unique_characters) <= 10, 'Too many letters'
is equivalent to this:
if len(unique_characters) > 10:
raise AssertionError('Too many letters')
8.5. Generator Expressions
# Generator expression
(x*2 for x in range(256))
# List comprehension
[x*2 for x in range(256)]
# Dictionary comprehension
{x:x*2 for x in range(256)}
# Set comprehension
{x*2 for x in range(256)}
# There is NO tuple comprehension
# but you can new a tuple with a generator expression
tuple(x*2 for x in range(256))
(x*2 for x in range(256))
is equal to:
def times2(range):
for x in range:
yield x*2
times2(range(256))
8.6. Calculating Permutations… The Lazy way!
# [1, 2, 3] 三个选两个做排列组合
>>> import itertools
>>> perms = itertools.permutations([1, 2, 3], 2)
>>> next(perms)
(1, 2)
>>> next(perms)
(1, 3)
>>> next(perms)
(2, 1)
>>> next(perms)
(2, 3)
>>> next(perms)
(3, 1)
>>> next(perms)
(3, 2)
>>> next(perms)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
The permutations()
function doesn’t have to take a list. It can take any sequence — even a string.
>>> import itertools
>>> perms = itertools.permutations('ABC', 3)
>>> next(perms)
('A', 'B', 'C')
8.7. Other Fun Stuff In The itertools
Module
>>> import itertools
>>> list(itertools.product('ABC', '123'))
[('A', '1'), ('A', '2'), ('A', '3'),
('B', '1'), ('B', '2'), ('B', '3'),
('C', '1'), ('C', '2'), ('C', '3')]
>>> list(itertools.combinations('ABC', 2))
[('A', 'B'), ('A', 'C'), ('B', 'C')]
>>> names = ['Alex', 'Anne', 'Dora', 'John', 'Mike', 'Chris', 'Ethan', 'Sarah', 'Lizzie', 'Wesley']
>>> groups = itertools.groupby(names, len)
>>> list(groups)
[(4, <itertools._grouper object at 0x00BA8BF0>),
(5, <itertools._grouper object at 0x00BB4050>),
(6, <itertools._grouper object at 0x00BB4030>)]
>>> list(itertools.chain(range(0, 3), range(10, 13)))
[0, 1, 2, 10, 11, 12]
>>> list(zip(range(0, 3), range(10, 13)))
[(0, 10), (1, 11), (2, 12)]
>>> list(zip(range(0, 3), range(10, 14)))
[(0, 10), (1, 11), (2, 12)]
>>> list(itertools.zip_longest(range(0, 3), range(10, 14)))
[(0, 10), (1, 11), (2, 12), (None, 13)]
8.8. A New Kind Of String Manipulation
>>> translation_table = {ord('A'): ord('O')}
>>> translation_table
{65: 79}
>>> 'MARK'.translate(translation_table)
'MORK'
更多内容见书上。
8.9. Evaluating Arbitrary Strings As Python Expressions
>>> eval('1 + 1 == 2')
True
>>> eval('1 + 1 == 3')
False
>>> eval('9567 + 1085 == 10652')
True
The eval()
function isn’t limited to boolean expressions. It can handle any Python expression and returns any datatype.
更多内容见书上。
8.10. Putting It All Together (略)
Chapter 9. Unit Testing
Python has a framework for unit testing, the appropriately-named unittest
module.
Tests run in isolation, separate from any other test cases (even if they test the same functions). Each test case is an island.
9.2. A Single Question
import roman1
import unittest
class KnownValues(unittest.TestCase):
def test_to_roman_known_values(self):
......
self.assertEqual(numeral, result)
if __name__ == '__main__':
unittest.main()
A test method takes no parameters, returns no value, and must have a name beginning with the four letters test
.
- If a test method exits normally without raising an exception, the test is considered passed;
- if the method raises an exception, the test is considered failed.
TDD: Write a test that fails, then code until it passes. (Still remember pass
?)
Running the script runs unittest.main()
, which runs each test case. There is no required organization of these test classes; they can each contain a single test method, or you can have one class that contains multiple test methods. The only requirement is that each test class must inherit from unittest.TestCase
.
For each test case, the unittest
module will print out the docstring
of the method and whether that test passed or failed.
9.3. “Halt And Catch Fire”
class OutOfRangeError(ValueError):
pass
class ToRomanBadInput(unittest.TestCase):
def test_too_large(self):
'''to_roman should fail with large input'''
self.assertRaises(roman2.OutOfRangeError, roman2.to_roman, 4000)
A unit test actually has three return values: pass
, fail
, and error
.
Pass
, of course, means that the test passed — the code did what you expected.Fail
means it executed the code but the result was not what you expected.Error
means that the code didn’t even execute properly.
9.4. More Halting, More Fire
if not (0 < n < 4000):
raise OutOfRangeError('number out of range (must be 1..3999)')
This is a nice Pythonic shortcut: multiple comparisons at once. This is equivalent to if not ((0 < n) and (n < 4000))
, but it’s much easier to read.
9.5. And One More Thing…
if not isinstance(n, int):
raise NotIntegerError('non-integers can not be converted')
The built-in isinstance()
function tests whether a variable is a particular type (or, technically, any descendant type).
Chapter 10. Refactoring
10.1. Diving In
Like it or not, bugs happen. Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven’t written yet.
if not s:
raise InvalidRomanNumeralError('Input can not be blank')
if not re.search(romanNumeralPattern, s):
raise InvalidRomanNumeralError('Invalid Roman numeral: {}'.format(s))
- 测试 blank string 用
if not s
- Starting in Python 3.1, you can skip the numbers when using positional indexes in a format specifier. That is, instead of using the format specifier
{0}
to refer to the first parameter to theformat()
method, you can simply use{}
and Python will fill in the proper positional index for you. This works for any number of arguments; the first{}
is{0}
, the second{}
is{1}
, and so forth.
Coding this way (i.e. TDD, even when fixing bugs) does not make fixing bugs any easier. Simple bugs require simple test cases; complex bugs will require complex test cases. In a testing-centric environment, it may seem like it takes longer to fix a bug, since you need to articulate in code exactly what the bug is (to write the test case), then fix the bug itself. Then if the test case doesn’t pass right away, you need to figure out whether the fix was wrong, or whether the test case itself has a bug in it. However, in the long run, this back-and-forth between test code and code tested pays for itself, because it makes it more likely that bugs are fixed correctly the first time. Also, since you can easily re-run all the test cases along with your new one, you are much less likely to break old code when fixing new code. Today’s unit test is tomorrow’s regression test.
10.2. Handling Changing Requirements
Comprehensive unit testing means never having to rely on a programmer who says “Trust me.”
10.3. Refactoring
# XXX.py
def ......
build_lookup_tables()
build_lookup_tables()
gets called when the module is imported. It is important to understand that modules are ONLY imported once, then cached. If you import an already-imported module, it does nothing. So this code will only get called the first time you import this module.
Chapter 11. Files
11.2. Reading From Text Files
a_file = open('examples/chinese.txt', encoding='utf-8')
- The directory path uses a forward slash. It just works, no matter what the OS is.
- The above path is a relative one.
11.2.1. Character Encoding Rears Its Ugly Head
Bytes are bytes; characters are an abstraction. A string is a sequence of Unicode characters. But a file on disk is not a sequence of Unicode characters; a file on disk is a sequence of bytes. So if you read a “text file” from disk, how does Python convert that sequence of bytes into a sequence of characters? It decodes the bytes according to a specific character encoding algorithm and returns a sequence of Unicode characters (otherwise known as a string).
11.2.2. Stream Objects
The open()
function returns a stream
object.
>>> a_file = open('examples/chinese.txt', encoding='utf-8')
>>> a_file.name
'examples/chinese.txt'
>>> a_file.encoding
'utf-8'
>>> a_file.mode
'r'
11.2.3. Reading Data From A Text File
>>> a_file = open('examples/chinese.txt', encoding='utf-8')
>>> a_file.read()
'Dive Into Python 是为有经验的程序员编写的一本 Python 书。\n'
>>> a_file.read()
''
Python does not consider reading past end-of-file to be an error; it simply returns an empty string.
11.2.4. Closing Files
>>> a_file.close()
>>> a_file.closed
True
11.2.5. Closing Files Automatically
with open('examples/chinese.txt', encoding='utf-8') as a_file:
a_file.seek(17)
a_character = a_file.read(1)
print(a_character)
This code calls open()
, but it never calls a_file.close()
. The with
statement starts a code block, like an if statement or a for
loop. Inside this code block, you can use the variable a_file as the stream object returned from the call to open()
. All the regular stream object methods are available — seek()
, read()
, whatever you need. When the with
block ends, even if you “exit” it via an unhandled exception, Python calls a_file.close()
automatically.
In technical terms, the with
statement creates a runtime context. In these examples, the stream object acts as a context manager. Python creates the stream object a_file
and tells it that it is entering a runtime context. When the with
code block is completed, Python tells the stream object that it is exiting the runtime context, and the stream object calls its own close()
method.
There’s nothing file-specific about the with
statement; it’s just a generic framework for creating runtime contexts and telling objects that they’re entering and exiting a runtime context. The behavior is defined in the stream object, not in the with
statement.
11.2.6. Reading Data One Line At A Time
line_number = 0
with open('examples/favorite-people.txt', encoding='utf-8') as a_file:
for a_line in a_file:
line_number += 1
print('{:>4} {}'.format(line_number, a_line.rstrip()))
To read a file one line at a time, use a for
loop. That’s it. Besides having explicit methods like read()
, the stream object is also an iterator which spits out a single line every time you ask for a value.
- The format specifier
{:>4}
means “print this argument right-justified within 4 spaces.” - The
rstrip()
string method removes the trailing whitespace, including the carriage return characters.
11.3. Writing To Text Files
There are two file modes for writing:
- “Write” mode will overwrite the file. Pass
mode='w'
to theopen()
function. - “Append” mode will add data to the end of the file. Pass
mode='a'
to theopen()
function.
11.4. Binary Files
Not all files contain text. Some of them contain pictures of my dog.
>>> an_image = open('examples/beauregard.jpg', mode='rb')
>>> an_image.mode
'rb'
>>> an_image.name
'examples/beauregard.jpg'
>>> an_image.encoding
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: '_io.BufferedReader' object has no attribute 'encoding'
- Opening a file in binary mode is simple but subtle. The only difference from opening it in text mode is that the
mode
parameter contains a ‘b’ character. - A binary stream object has no
encoding
attribute
>>> data = an_image.read(3)
>>> data
b'\xff\xd8\xff'
- You’re reading bytes, not strings. Since you opened the file in binary mode, the
read()
method takes the number of bytes to read, not the number of characters.
11.5. Stream Objects From Non-File Sources
In the simplest case, a stream object is anything with a read()
method which takes an optional size
parameter and returns a string.
- When called with no
size
parameter, theread()
method should read everything there is to read from the input source and return all the data as a single value. - When called with a
size
parameter, it reads that much from the input source and returns that much data. When called again, it picks up where it left off and returns the next chunk of data.
You’re not limiting yourself to real files. The input source that’s being “read” could be anything: a web page, a string in memory, even the output of another program.
>>> a_string = 'PapayaWhip is the new black.'
>>> import io
>>> a_file = io.StringIO(a_string)
>>> a_file.read()
'PapayaWhip is the new black.'
io.StringIO
lets you treat a string as a text file.- There’s also a
io.BytesIO
class, which lets you treat a byte array as a binary file.
11.5.1. Handling Compressed Files (略)
11.6. Standard Input, Output, And Error
By default, both of these pipes, stdout
and stderr
, are just connected to the terminal window where you are working.
In the graphical Python Shell, the stdout
and stderr
pipes default to your “Interactive Window”.
>>> for i in range(3):
... sys.stdout.write('is the')
is theis theis the
stdout
is defined in the sys
module, and it is a stream object. Calling its write()
function will print out whatever string you give it.
- In fact, this is what the print function really does; it adds a carriage return to the end of the string you’re printing, and calls
sys.stdout.write
.
sys.stdout
and sys.stderr
are stream objects, but they are write-only. Attempting to call their read()
method will always raise an IOError
.
11.6.1. Redirecting Standard Output
sys.stdout
and sys.stderr
are not constants; they’re variables. That means you can assign them a new value — any other stream object — to redirect their output.
Any class can be a context manager by defining two special methods: __enter__()
and __exit__()
.(书上是一个 redirect + with 的例子)
Chapter 12. Xml (略)
Chapter 13. Serializing Python Objects
13.1. Diving In
On the surface, the concept of serialization is simple. You have a data structure in memory that you want to save, reuse, or send to someone else. How would you do that? Well, that depends on how you want to save it, how you want to reuse it, and to whom you want to send it. Many games allow you to save your progress when you quit the game and pick up where you left off when you relaunch the game. In this case, a data structure that captures “your progress so far” needs to be stored on disk when you quit, then loaded from disk when you relaunch. The data is only meant to be used by the same program that created it, never sent over a network, and never read by anything other than the program that created it. Therefore, the interoperability issues are limited to ensuring that later versions of the program can read data written by earlier versions.
For cases like this, the pickle
module is ideal. It’s part of the Python standard library, so it’s always available. It’s fast; the bulk of it is written in C, like the Python interpreter itself. It can store arbitrarily complex Python data structures.
13.2. Saving Data To A Pickle File
>>> import pickle
>>> with open('entry.pickle', 'wb') as f:
... pickle.dump(entry, f)
- To do this, it serializes the data structure using a data format called “the pickle protocol.”
- Not every Python data structure can be serialized by the
pickle
module. The pickle protocol has changed several times as new data types have been added to the Python language, but there are still limitations. - As a result of these changes, there is no guarantee of compatibility between different versions of Python itself. Newer versions of Python support the older serialization formats, but older versions of Python do not support newer formats (since they don’t support the newer data types).
- The latest version of the
pickle
protocol is a binary format. Be sure to open your pickle files in binary mode, or the data will get corrupted during writing.
13.3. Loading Data From A Pickle File
>>> import pickle
>>> with open('entry.pickle', 'rb') as f:
... entry = pickle.load(f)
13.4. Pickling without A File
You can also serialize to a bytes
object in memory.
>>> b = pickle.dumps(entry)
>>> type(b)
<class bytes>
13.6. Debugging Pickle Files (略)
13.7. Serializing Python Objects To Be Read By Other Languages
主要是用 JSON. The JSON data format is text-based, not binary.
其余看书。
Comments