Python's Data Model

Table of Contents

Python documentation and reference on which these notes are based
Data model
Python version
Since this note concerns a programming language, it is important to keep track of when this note was last updated. These notes use v3.9.2 and were last updated in March 2021.

1 Objects

Objects are python's abstraction of data. All data is stored as objects. All code is also represented by objects.

Object = (Identity, Type, Value)

  1. The identity can be though of as an object's location in memory. id() gives the identity (or memory address) of an object as an integer, while is operator compares identities of two objects.
  2. The type determines the operations supported by an object, as well possible values for that object.
  3. The value is the data stored-in/referenced-by an object.
    • an object is immutable if it value cannot be changed. E.g. numbers, strings, tuples.
    • an object is mutable if it can be. E.g. dicts, lists.

1.1 Containers

A container is an object that contains references to another object.

Note: an immutable container of mutable objects is still considered immutable! E.g. frozenset[list[int]]

The following lines are quoted directly from the documentation and are useful, but somewhat hard to understand.

An object of an immutable sequence type cannot change once it is created. (If the object contains references to other objects, these other objects may be mutable and may be changed; however, the collection of objects directly referenced by an immutable object cannot change.)

2 The standard type hierarchy

In the following table, we use (⊐ A) to denote that the type A is a subtype of the closest type listed in boldface above it. For example, the table says that Sequences ⊐ list, i.e. that lists are a subtype of Sequences.

Python's standard type Values Isomorphic (or approximate) Lean type Comments
None {None} unit truthiness of None value is False.
NotImplemented {NotImplemented} unit value has no truthiness defined
Ellipsis {...} or {Ellipsis} unit the value of truthiness of True.
numbers.Number depends on subtype typeclass always immutable
⊐ numbers.Integral depends on subtype --
⊐⊐ int {…, -1, 0, 1, 2, …} ℤ (restricted by virtual memory) --
⊐⊐ bool {False, True} bool (but subtype of ℤ) --
⊐ numbers.Real (float) floating points restricted version of ℝ depends on computer architecture
⊐ numbers.Complex (complex) complex floating points restricted version of ℂ depends on computer architecture
Sequences finite collection of ordered values indexed by ℕ array finite ordered sets indexed by ℕ
⊐ str seq. of Unicode code points in range U+0000 to U+10FFFF string immutable
⊐ tuple seq. of any Python objects list (but nonhomogeneous) immutable version of list
⊐ bytes seq. of bytes formed from ASCII ints (0 to 255) -- immutable version of bytearray
⊐ list seq. of any Python objects Lean does not have mutable containers mutable version of tuple
⊐ bytearray seq. of bytes formed from ASCII ints (0 to 255) -- mutable version of bytes
Set types finite collection of unordered, unique values -- --
⊐ set finite set of any immutable Python objects Lean does not have mutable containers mutable version of frozenset
⊐ frozenset finite set of any immutable Python objects finset (but nonhomogeneous) immutable version of set
Mappings finite collections of objects indexed by arbitrary index sets -- --
⊐ dict mappings with immutable keys alist (but nonhomogeneous and mutable) mutable
Callable types things that can be "called" using () -- --
⊐ user-defined functions function with zero or more arguments function types (but possible non-exhaustive, non-terminating)  
⊐ instance methods (class, class-instance, callable-object) field of a structure's instance the class instance is __self__, the function instance is __func__
⊐ generator functions a function that uses the yield statement (all Lean functions are actually generators ∵ Lean is lazy) the generator returns an iterator. Calling iterator.__next__() executes function code.
⊐ coroutine functions function defined using async def -- calling coroutine function returns a coroutine object.
⊐ asynchronous generator functions function defined using async def and that uses yield -- calling it returns an asynchronous iterator. Calling aiterator.__anext__() returns an awaitable object.
⊐ built-in functions a wrapper around a C function not distinct from function types --
⊐ built-in methods (object, built-in-function) implemented in C not distinct from field of a structure's instance e.g. [1,2,3].append(4)
⊐ class factories for new class instances (I think cast/coe can be thought of as a factory. Not sure.) new instances are created by calling __new__() which in turn (usually) calls __init__().
⊐ class instances instances of classes (is callable sometimes) -- to make an instance callable, define a __call__() method in its class.
Modules basic organizational unit of Python code -- usually created by the import system using the import statement
Custom classes user-defined class definitions structure --
Class instances instance of a user-defined class term of type structure created by calling a class object
I/O objects (file objects) an open file (I think this is io.mode) usually created by the open() function
Internal types -- -- this includes code objects, frame objects, traceback objects, slice objects, static method objects, and class method objects

Author: Vaibhav Karve

Created: 2021-04-08 Thu 10:49