Python's Data Model
Home
Table of Contents
- Python documentation and reference on which these notes are based
- → Data model
- Python version
- Since this note concerns a programming language, it is important to keep track of when this note was last updated. These notes use v3.9.2 and were last updated in March 2021.
1. Objects
Objects are python's abstraction of data. All data is stored as objects. All code is also represented by objects.
Object = (Identity, Type, Value)
- The identity can be though of as an object's location in memory.
id()
gives the identity (or memory address) of an object as an integer, whileis
operator compares identities of two objects. - The type determines the operations supported by an object, as well possible values for that object.
- The value is the data stored-in/referenced-by an object.
- an object is immutable if it value cannot be changed. E.g. numbers, strings, tuples.
- an object is mutable if it can be. E.g. dicts, lists.
1.1. Containers
A container is an object that contains references to another object.
Note: an immutable container of mutable objects is still considered
immutable! E.g. frozenset[list[int]]
The following lines are quoted directly from the documentation and are useful, but somewhat hard to understand.
An object of an immutable sequence type cannot change once it is created. (If the object contains references to other objects, these other objects may be mutable and may be changed; however, the collection of objects directly referenced by an immutable object cannot change.)
2. The standard type hierarchy
In the following table, we use (⊐ A) to denote that the type A is a subtype of the closest type listed in boldface above it. For example, the table says that Sequences ⊐ list, i.e. that lists are a subtype of Sequences.
Python's standard type | Values | Isomorphic (or approximate) Lean type | Comments |
---|---|---|---|
None | {None} | unit | truthiness of None value is False. |
NotImplemented | {NotImplemented} | unit | value has no truthiness defined |
Ellipsis | {... } or {Ellipsis} |
unit | the value of truthiness of True. |
numbers.Number | depends on subtype | typeclass | always immutable |
⊐ numbers.Integral | depends on subtype | ℤ | -- |
⊐⊐ int | {…, -1, 0, 1, 2, …} | ℤ (restricted by virtual memory) | -- |
⊐⊐ bool | {False, True} | bool (but subtype of ℤ) | -- |
⊐ numbers.Real (float) | floating points | restricted version of ℝ | depends on computer architecture |
⊐ numbers.Complex (complex) | complex floating points | restricted version of ℂ | depends on computer architecture |
Sequences | finite collection of ordered values indexed by ℕ | array | finite ordered sets indexed by ℕ |
⊐ str | seq. of Unicode code points in range U+0000 to U+10FFFF | string | immutable |
⊐ tuple | seq. of any Python objects | list (but nonhomogeneous) | immutable version of list |
⊐ bytes | seq. of bytes formed from ASCII ints (0 to 255) | -- | immutable version of bytearray |
⊐ list | seq. of any Python objects | Lean does not have mutable containers | mutable version of tuple |
⊐ bytearray | seq. of bytes formed from ASCII ints (0 to 255) | -- | mutable version of bytes |
Set types | finite collection of unordered, unique values | -- | -- |
⊐ set | finite set of any immutable Python objects | Lean does not have mutable containers | mutable version of frozenset |
⊐ frozenset | finite set of any immutable Python objects | finset (but nonhomogeneous) | immutable version of set |
Mappings | finite collections of objects indexed by arbitrary index sets | -- | -- |
⊐ dict | mappings with immutable keys | alist (but nonhomogeneous and mutable) | mutable |
Callable types | things that can be "called" using () | -- | -- |
⊐ user-defined functions | function with zero or more arguments | function types (but possible non-exhaustive, non-terminating) | |
⊐ instance methods | (class, class-instance, callable-object) | field of a structure's instance | the class instance is __self__ , the function instance is __func__ |
⊐ generator functions | a function that uses the yield statement |
(all Lean functions are actually generators ∵ Lean is lazy) | the generator returns an iterator. Calling iterator.__next__() executes function code. |
⊐ coroutine functions | function defined using async def |
-- | calling coroutine function returns a coroutine object. |
⊐ asynchronous generator functions | function defined using async def and that uses yield |
-- | calling it returns an asynchronous iterator. Calling aiterator.__anext__() returns an awaitable object. |
⊐ built-in functions | a wrapper around a C function | not distinct from function types | -- |
⊐ built-in methods | (object, built-in-function) implemented in C | not distinct from field of a structure's instance | e.g. [1,2,3].append(4) |
⊐ class | factories for new class instances | (I think cast/coe can be thought of as a factory. Not sure.) | new instances are created by calling __new__() which in turn (usually) calls __init__() . |
⊐ class instances | instances of classes (is callable sometimes) | -- | to make an instance callable, define a __call__() method in its class. |
Modules | basic organizational unit of Python code | -- | usually created by the import system using the import statement |
Custom classes | user-defined class definitions | structure | -- |
Class instances | instance of a user-defined class | term of type structure | created by calling a class object |
I/O objects (file objects) | an open file | (I think this is io.mode ) |
usually created by the open() function |
Internal types | -- | -- | this includes code objects, frame objects, traceback objects, slice objects, static method objects, and class method objects |