Python Language Basics, IPython, and Jupyter Notebooks
The Python Interpreter
Python is an interpreted language. The Python interpreter runs a program by executing one statement at a time. The standard interactive Python interpreter can be invoked on the command line with the python command. Running Python programs is as simple as calling python with a .py file as its first argument. Suppose we had created hello_world.py with these contents :
print('Hello world')
You can run it by executing the following command (the hello_world.py file must be in your current working terminal directory) :
$ python hello_world.py
Hello world
IPython Basics
Running the IPython Shell
You can launch the IPython shell on the command line just like launching the regular Python interpreter except with the ipython command:
$ ipython
You can execute arbitrary Python statements by typing them in and pressing Return (or Enter). When you type just a variable into IPython, it renders a string representation of the object :
In [5]: import numpy as np
IPython also provides facilities to execute arbitrary blocks of code (via a somewhat glorified copy-and-paste approach) and whole Python scripts. You can also use the Jupyter notebook to work with larger blocks of code.
Running the Jupyter Notebook
One of the major components of the Jupyter project is the notebook, a type of interactive document for code, text (with or without markup), data visualizations, and other output. On many platforms, Jupyter will automatically open up in your default web browser (unless you start it with –no-browser). Otherwise, you can navigate to the HTTP address printed when you started the notebook, here http://localhost:8888/. To create a new notebook, click the New button and select the “Python 3” or “conda [default]” option. When you save the notebook (see “Save and Checkpoint” under the notebook File menu), it creates a file with the extension .ipynb. This is a self-contained file format that contains all of the content (including any evaluated code output) currently in the notebook. These can be loaded and edited by other Jupyter users.
Tab Completion
One of the major improvements over the standard Python shell is tab completion, found in many IDEs or other interactive computing analysis environments. While entering expressions in the shell, pressing the Tab key will search the namespace for any variables (objects, functions, etc.) matching the characters you have typed so far :
In [1]: an_apple = 27
In [2]: an_example = 42
In [3]: an<Tab>
an_apple and an_example any
In this example, note that IPython displayed both the two variables I defined as well as the Python keyword and and built-in function any. Naturally, you can also complete methods and attributes on any object after typing a period.
In [3]: b = [1, 2, 3]
In [4]: b.<Tab>
b.append b.count b.insert b.reverse b.clear b.extend b.pop b.sort b.copy b.index b.remove
Introspection
Using a question mark (?) before or after a variable will display some general information about the object :
In [8]: b = [1, 2, 3]
In [9]: b?
Type: list String Form:[1, 2, 3] Length: 3 Docstring: list() -> new empty list list(iterable) -> new list initialized from iterable’s items
This is referred to as object introspection. If the object is a function or instance method, the docstring, if defined, will also be shown. Using ?? will also show the function’s source code if possible :
The %run Command
You can run any file as a Python program inside the environment of your IPython session using the %run command. Suppose you had the following simple script stored in ipython_script_test.py :
def f(x, y, z):
return (x + y) / z
a = 5
b = 6
c = 7.5
result = f(a, b, c)
You can execute this by passing the filename to %run:
In [14]: %run ipython_script_test.py
The script is run in an empty namespace (with no imports or other variables defined) so that the behavior should be identical to running the program on the command line using python script.py. All of the variables (imports, functions, and globals) defined in the file (up until an exception, if any, is raised) will then be accessible in the IPython shell. In the Jupyter notebook, you may also use the related %load magic function, which imports a script into a code cell :
>>> %load ipython_script_test.py
Pressing Ctrl-C while any code is running, whether a script through %run or a long-running command, will cause a KeyboardInterrupt to be raised. This will cause nearly all Python programs to stop immediately except in certain unusual cases.
Executing Code from the Clipboard
The most foolproof methods are the %paste and %cpaste magic functions. %paste takes whatever text is in the clipboard and executes it as a single block in the shell :
In [17]: %paste
x = 5
y = 7
if x > 5:
x += 1
y = 8
## -- End pasted text --
%cpaste is similar, except that it gives you a special prompt for pasting code into :
In [18]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:x = 5
:y = 7
:if x > 5:
: x += 1
:
: y = 8
:--
With the %cpaste block, you have the freedom to paste as much code as you like before executing it. You might decide to use %cpaste in order to look at the pasted code before executing it. If you accidentally paste the wrong code, you can break out of the %cpaste prompt by pressing Ctrl-C.
Terminal Keyboard Shortcuts
IPython has many keyboard shortcuts for navigating the prompt (which will be familiar to users of the Emacs text editor or the Unix bash shell) and interacting with the shell’s command history.
Keyboard Shortcut | Description |
---|---|
Ctrl-P or up-arrow | Search backward in command history for commands starting with currently entered text |
Ctrl-N or down-arrow | Search forward in command history for commands starting with currently entered text |
Ctrl-R | Readline-style reverse history search (partial matching) |
Ctrl-Shift-V | Paste text from clipboard |
Ctrl-C | Interrupt currently executing code |
Ctrl-A | Move cursor to beginning of line |
Ctrl-E | Move cursor to end of line |
Ctrl-K | Delete text from cursor until end of line |
Ctrl-U | Discard all text on current line |
Ctrl-F | Move cursor forward one character |
Ctrl-B | Move cursor back one character |
Ctrl-L | Clear screen |
About Magic Commands
IPython’s special commands (which are not built into Python itself) are known as “magic” commands. These are designed to facilitate common tasks and enable you to easily control the behavior of the IPython system. A magic command is any command prefixed by the percent symbol %. For example, you can check the execution time of any Python statement, such as a matrix multiplication, using the %timeit magic function (which will be discussed in more detail later) :
In [20]: a = np.random.randn(100, 100)
In [20]: %timeit np.dot(a, a)
10000 loops, best of 3: 20.9 µs per loop
Magic functions can be used by default without the percent sign, as long as no variable is defined with the same name as the magic function in question. This feature is called automagic and can be enabled or disabled with %automagic. Some frequently used IPython magic commands :
Command | Description |
---|---|
%quickref | Display the IPython Quick Reference Card |
%magic | Display detailed documentation for all of the available magic commands |
%debug | Enter the interactive debugger at the bottom of the last exception traceback |
%hist | Print command input (and optionally output) history |
%pdb | Automatically enter debugger after any exception |
%paste | Execute preformatted Python code from clipboard |
%cpaste | Open a special prompt for manually pasting Python code to be executed |
%reset | Delete all variables/names defined in interactive namespace |
%page OBJECT | Pretty-print the object and display it through a pager |
%run script.py | Run a Python script inside IPython |
%prun statement | Execute statement with cProfile and report the profiler output |
%time statement | Report the execution time of a single statement |
%timeit statement | Run a statement multiple times to compute an ensemble average execution time; useful for timing code with very short execution time |
%who, %who_ls, %whos | Display variables defined in interactive namespace, with varying levels of information/verbosity |
%xdel variable | Delete a variable and attempt to clear any references to the object in the IPython internals |
Python Language Basics
Language Semantics
The Python language design is distinguished by its emphasis on readability, simplicity, and explicitness. Some people go so far as to liken it to “executable pseudocode.” Python uses whitespace (tabs or spaces) to structure code instead of using braces as in many other languages like R, C++, Java, and Perl. Consider a for loop from a sorting algorithm :
for x in array:
if x < pivot:
less.append(x)
else:
greater.append(x)
A colon denotes the start of an indented code block after which all of the code must be indented by the same amount until the end of the block. As you can see by now, Python statements also do not need to be terminated by semicolons. Semicolons can be used, however, to separate multiple statements on a single line :
a = 5; b = 6; c = 7
An important characteristic of the Python language is the consistency of its object model. Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box,” which is referred to as a Python object. Each object has an associated type (e.g., string or function) and internal data. Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter. This is often used to add comments to code.
results = []
for line in file_handle:
# keep the empty lines for now
# if len(line) == 0:
# continue
results.append(line.replace('foo', 'bar'))
You call functions using parentheses and passing zero or more arguments, optionally assigning the returned value to a variable :
result = f(x, y, z)
g()
Almost every object in Python has attached functions, known as methods, that have access to the object’s internal contents. You can call them using the following syntax :
obj.some_method(x, y, z)
When assigning a variable (or name) in Python, you are creating a reference to the object on the righthand side of the equals sign. When you pass objects as arguments to a function, new local variables are created referencing the original objects without any copying. If you bind a new object to a variable inside a function, that change will not be reflected in the parent scope. In contrast with many compiled languages, such as Java and C++, object references in Python have no type associated with them. There is no problem with the following :
In [12]: a = 5
In [13]: type(a)
int
Variables are names for objects within a particular namespace; the type information is stored in the object itself. Knowing the type of an object is important, and it’s useful to be able to write functions that can handle many different kinds of input. You can check that an object is an instance of a particular type using the isinstance function :
In [21]: a = 5
In [22]: isinstance(a, int)
True
Often you may not care about the type of an object but rather only whether it has certain methods or behavior. This is sometimes called “duck typing,” after the saying “If it walks like a duck and quacks like a duck, then it’s a duck.” For example, you can verify that an object is iterable if it implemented the iterator protocol. For many objects, this means it has a iter “magic method,” though an alternative and better way to check is to try using the iter function. This function would return True for strings as well as most Python collection types :
In [29]: isiterable('a string')
True
If we wanted to access the variables and functions defined in some_module.py, from another file in the same directory we could do :
import some_module
result = some_module.f(5)
pi = some_module.PI
To check if two references refer to the same object, use the is keyword. is not is also perfectly valid if you want to check that two objects are not the same :
In [35]: a = [1, 2, 3]
In [36]: b = a
In [37]: c = list(a)
In [38]: a is b
True
In [39]: a is not c
True
Most objects in Python, such as lists, dicts, NumPy arrays, and most user-defined types (classes), are mutable. This means that the object or values that they contain can be modified :
In [43]: a_list = ['foo', 2, [4, 5]]
In [44]: a_list[2] = (3, 4)
In [45]: a_list
[‘foo’, 2, (3, 4)]
Others, like strings and tuples, are immutable.
Scalar Types
Python along with its standard library has a small set of built-in types for handling numerical data, strings, boolean (True or False) values, and dates and time. These “single value” types are sometimes called scalar types.
Type | Description |
---|---|
None | The Python “null” value (only one instance of the None object exists) |
str | String type; holds Unicode (UTF-8 encoded) strings |
bytes | Raw ASCII bytes (or Unicode encoded as bytes) |
float | Double-precision (64-bit) floating-point number (note there is no separate double type) |
bool | A True or False value |
int | Arbitrary precision signed integer |
The primary Python types for numbers are int and float. An int can store arbitrarily large numbers :
In [48]: ival = 17239871
In [49]: ival ** 6
26254519291092456596965462913230729701102721
Integer division not resulting in a whole number will always yield a floating-point number :
In [52]: 3 / 2
1.5
Many people use Python for its powerful and flexible built-in string processing capabilities. You can write string literals using either single quotes ‘ or double quotes “ :
a = 'one way of writing a string'
b = "another way"
Python strings are immutable; you cannot modify a string. Strings are a sequence of Unicode characters and therefore can be treated like other sequences, such as lists and tuples. The backslash character \ is an escape character, meaning that it is used to specify special characters like newline \n or Unicode characters. Adding two strings together concatenates them and produces a new string. The two boolean values in Python are written as True and False. Comparisons and other conditional expressions evaluate to either True or False. Boolean values are combined with the and and or keywords :
In [89]: True and True
True
In [90]: False or True
True
None is the Python null value type. If a function does not explicitly return a value, it implicitly returns None :
In [97]: a = None
In [98]: a is None
True
The built-in Python datetime module provides datetime, date, and time types. The datetime type, as you may imagine, combines the information stored in date and time and is the most commonly used :
In [102]: from datetime import datetime, date, time
In [103]: dt = datetime(2011, 10, 29, 20, 30, 21)
In [104]: dt.day
29
Control Flow
Python has several built-in keywords for conditional logic, loops, and other standard control flow concepts found in other programming languages. The if statement is one of the most well-known control flow statement types. It checks a condition that, if True, evaluates the code in the block that follows :
if x < 0:
print('It's negative')
An if statement can be optionally followed by one or more elif blocks and a catch-all else block if all of the conditions are False :
if x < 0:
print('It's negative')
elif x == 0:
print('Equal to zero')
elif 0 < x < 5:
print('Positive but smaller than 5')
else:
print('Positive and larger than or equal to 5')
for loops are for iterating over a collection (like a list or tuple) or an iterater. The standard syntax for a for loop is :
for value in collection:
# do something with value
You can advance a for loop to the next iteration, skipping the remainder of the block, using the continue keyword. Consider this code, which sums up integers in a list and skips None values :
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
if value is None:
continue
total += value
A while loop specifies a condition and a block of code that is to be executed until the condition evaluates to False or the loop is explicitly ended with break :
x = 256
total = 0
while x > 0:
if total > 500:
break
total += x
x = x // 2
pass is the “no-op” statement in Python. It can be used in blocks where no action is to be taken (or as a placeholder for code not yet implemented); it is only required because Python uses whitespace to delimit blocks :
if x < 0:
print('negative!')
elif x == 0:
# TODO: put something smart here
pass
else:
print('positive!')
The range function returns an iterator that yields a sequence of evenly spaced integers :
In [122]: range(10)
range(0, 10)
A ternary expression in Python allows you to combine an if-else block that produces a value into a single line or expression. The syntax for this in Python is :
value = true-expr if condition else false-expr
As with if-else blocks, only one of the expressions will be executed. Thus, the “if ” and “else” sides of the ternary expression could contain costly computations, but only the true branch is ever evaluated.