Python Built-in Methods

  • This section covers some useful Python built-in methods and libraries.

Strings

Using isinstance() vs. type() for type-checking

  • isinstance() caters for inheritance (an instance of a derived class is an instance of a base class, too), while checking for equality of type does not (it demands identity of types and rejects instances of subtypes, a.k.a. subclasses).

  • For your code to support inheritance, isinstance() is less bad than checking identity of types because it seamlessly supports inheritance. It’s not that isinstance is good, mind you—it’s just less bad than checking equality of types. The normal, Pythonic, preferred solution is almost invariably “duck typing”: try using the argument as if it was of a certain desired type, do it in a try/except statement catching all exceptions that could arise if the argument was not in fact of that type (or any other type nicely duck-mimicking it;-), and in the except clause, try something else (using the argument “as if” it was of some other type).

  • basestring is, however, quite a special case—a builtin type that exists only to let you use isinstance() (both str and unicode subclass basestring). Strings are sequences (you could loop over them, index them, slice them, …), but you generally want to treat them as “scalar” types—it’s somewhat inconvenient (but a reasonably frequent use case) to treat all kinds of strings (and maybe other scalar types, i.e., ones you can’t loop on) one way, all containers (lists, sets, dicts, …) in another way, and basestring plus isinstance() helps you do that—the overall structure of this idiom is something like:

s1 = unicode("test")
s2 = "test"
isinstance(s1, basestring) ## Returns True
isinstance(s2, basestring) ## Returns True
  • A gotcha with isinstance() is that the bool datatype is a subclass of the int datatype:
issubclass(bool, int) ## Returns True

Index of a Substring using str.find() or str.index()

  • To find the index of a substring in a string, use the str.find() method which returns the index of the first occurrence of the substring if found and -1 otherwise.
sentence = "Today is Saturaday"
  • Find the index of first occurrence of the substring:
sentence.find("day") ## Returns 2
sentence.find("nice") ## Returns -1
  • You can also provide the starting and stopping position of the search:
## Start searching for the substring at index 3
sentence.find("day", 3) ## Returns 15
  • Note that you can also use str.index() to accomplish the same result.

Replace a String with Another String Using Regular Expressions

  • To either replace one string with another string or to change the order of characters in a string, use re.sub().

  • re.sub() allows you to use a regular expression to specify the pattern of the string you want to swap.

  • In the code below, we replace 3/7/2021 with Sunday and replace 3/7/2021 with 2021/3/7.

import re

text = "Today is 3/7/2021"
match_pattern = r"(\d+)/(\d+)/(\d+)"

re.sub(match_pattern, "Sunday", text) ## Returns 'Today is Sunday'
re.sub(match_pattern, r"\3-\1-\2", text) ## Returns 'Today is 2021-3-7'

Lists

Create a copy of a list using = vs. <list>.copy()

  • When you create a copy of a list using the = operator, a change in the second list will lead to the change in the first list. It is because both lists point to the same object.
l1 = [1, 2, 3]
l2 = l1 
l2.append(4)
l2 ## Returns [1, 2, 3, 4]
l1 ## Returns [1, 2, 3, 4]

l1 is l2 ## Returns True since they are the same object
  • Instead of using the = operator, use the copy() method. Now any changes to the second list will not reflect in the first list.
l1 = [1, 2, 3]
l2 = l1.copy()
l2.append(4)
l2 ## Returns [1, 2, 3, 4]
l1 ## Returns [1, 2, 3]

Get counter and value while looping using enumerate()

  • Rather than using for i in range(len(array)) to access both the index and the value of the array, use enumerate() instead. It produces the same result but it is much cleaner.
arr = ['a', 'b', 'c', 'd', 'e']

## Instead of this
for i in range(len(arr)):
    print(i, arr[i])
## Prints 
## 0 a
## 1 b
## 2 c
## 3 d
## 4 e

## Use this
for i, val in enumerate(arr):
    print(i, val)
## Prints 
## 0 a
## 1 b
## 2 c
## 3 d
## 4 e

list.append() vs. list.extend() vs. +=

  • To add a list to another list, use the list.append() method or +=. To add elements of a list to another list, use the list.extend() method.
a = [1, 2, 3]
a.append([4, 5])
a ## Returns [1, 2, 3, [4, 5]]

a = [1, 2, 3]
a.extend([4, 5])
a ## Returns [1, 2, 3, 4, 5]

a = [1, 2, 3]
a += [4, 5]
a ## Returns [1, 2, 3, 4, 5]

Get Elements

random.choice(): Get a Randomly Selected Element from a List
  • Besides getting a random number, you can also get a random element from a Python list using random. In the code below, “stay at home” was picked randomly from a list of options.
import random 

to_do_tonight = ['stay at home', 'attend party', 'do exercise']
random.choice(to_do_tonight) ## Returns 'attend party'
random.sample(): Get Multiple Random Elements from a List
  • To get n random elements from a list, use random.sample.
import random

random.seed(1)
nums = [1, 2, 3, 4, 5]
random_nums = random.sample(nums, 2)
random_nums ## Returns [2, 1]
heapq: Find n Max Values of a List
  • To extract n max values from a large Python list, using heapq will speed up the code.

  • In the code below, using heapq is >2x faster than using sorting and indexing. Both methods try to find the max values of a list of 10000 items.

import heapq
import random
from timeit import timeit

random.seed(0)
l = random.sample(range(0, 10000), 10000)

def get_n_max_sorting(l: list, n: int):
    l = sorted(l, reverse=True)
    return l[:n]

def get_n_max_heapq(l: list, n: int):
    return heapq.nlargest(n, l)

expSize = 1000
n = 100

time_sorting = timeit("get_n_max_sorting(l, n)", number=expSize,
                        globals=globals())
time_heapq = timeit('get_n_max_heapq(l, n)', number=expSize,
                    globals=globals())

ratio = round(time_sorting/time_heapq, 3)
print(f'Run {expSize} experiments. Using heapq is {ratio} times'
' faster than using sorting')
## Prints Run 1000 experiments. Using heapq is 2.827 times faster than using sorting

Unpacking

How to Unpack Iterables
  • To assign items of a Python iterables (such as list, tuple, string) to different variables, you can unpack the iterable like below.
nested_arr = [[1, 2, 3], ["a", "b"], 4]
num_arr, char_arr, num = nested_arr
num_arr
## Prints [1, 2, 3]

char_arr
## Prints ['a', 'b']
Extended Iterable Unpacking: Ignore Multiple Values when Unpacking
  • To ignore multiple values when unpacking a Python iterable, add * to _ as shown below.
  • This is called “Extended Iterable Unpacking” and is available in Python 3.x.
a, *_, b = [1, 2, 3, 4]
print(a)
## Prints 1

b
## Prints 4

_
## Prints [2, 3]

Join Iterables

join(): Turn an Iterable into a String
  • To turn an iterable into a string, use join().
  • In the code below, elements are joined in the list fruits using , .
fruits = ['apples', 'oranges', 'grapes']

fruits_str = ', '.join(fruits)
print(f"Today, I need to get some {fruits_str} in the grocery store")
## Prints "Today, I need to get some apples, oranges, grapes in the grocery store"
zip(): Create Pairs of Elements from Two Iterators
  • To to create pairs of elements from two lists use the zip() method which aggregates them in a list of tuples.
nums = [1, 2, 3, 4]
string = "abcd"
combinations = zip(nums, string)
combinations ## Prints [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

nums = [1, 2, 3, 4]
chars = ['a', 'b', 'c', 'd']

comb = zip(nums, chars)
comb ## Returns [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
  • You can also unzip the list of tuples back to it’s original form by using zip(*list_of_tuples):
nums_2, chars_2 = zip(*comb)
nums_2, chars_2 ## Returns ((1, 2, 3, 4), ('a', 'b', 'c', 'd'))

Interaction Between Two Lists

set.intersection(): Find the Intersection Between Two Sets
  • To get the common elements between two iterators, convert them to sets then use set.intersection() (Python 2) or the & operator (Python 3).
requirement1 = ['pandas', 'numpy', 'statsmodel']
requirement2 = ['numpy', 'statsmodel', 'sympy', 'matplotlib']

## Python 2
intersection = set.intersection(set(requirement1), set(requirement2))
list(intersection) ## Returns ['statsmodel', 'numpy']

## Python 3
intersection = set(requirement1) & set(requirement2)
list(intersection) ## Returns ['statsmodel', 'numpy']
<set>.difference(): Find the Difference Between Two Sets
  • To find the difference between two iterators, convert them to sets then apply <set>.difference() (Python 2) or the - operator (Python 3) to the sets.
a = [1, 2, 3, 4]
b = [1, 3, 4, 5, 6]

## Python 2
## Find elements in a but not in b
diff = set(a).difference(set(b))
list(diff) ## Returns [2]

## Find elements in b but not in a
diff = set(b).difference(set(a))
list(diff) ## Returns [5, 6]

## Python 3
## Find elements in a but not in b
diff = set(a) - set(b)
list(diff) ## Returns [2]

## Find elements in b but not in a
diff = set(b) - set(a)
list(diff) ## Returns [5, 6]
set.union(): Find the Union Between Two Sets
  • To get the union of elements from two sets, use set.union() (Python 2) or the | operator (Python 3).
requirement1 = ['pandas', 'numpy', 'statsmodel']
requirement2 = ['numpy', 'statsmodel', 'sympy', 'matplotlib']

## Python 2
union = set.union(set(requirement1), set(requirement2))
list(union) ## Returns ['sympy', 'statsmodel', 'numpy', 'pandas', 'matplotlib']

## Python 3
union = set(requirement1) | set(requirement2)
list(union) ## Returns ['sympy', 'statsmodel', 'numpy', 'pandas', 'matplotlib']

Apply Functions to Elements in a List

any(): Check if Any Element of an Iterable is True
  • To check if any element of an iterable is True, use any(). In the code below, any() find if any element in the text is in uppercase.
text = "abcdE"
any(c.isupper() for c in text) ## Returns True
all(): Check if All Elements of an Iterable Are Strings
  • To check if all elements of an iterable are strings, use all() and isinstance().
l = ['a', 'b', 1, 2]
all(isinstance(item, str) for item in l) ## Returns False
filter(): Get the Elements of an Iterable that a Function Evaluates True
  • To get the elements of an iterable that a function returns true, use filter().

  • In the code below, the filter method gets items that are fruits:

def get_fruit(val: str):
    fruits = ['apple', 'orange', 'grape']
    return val in fruits

items = ['chair', 'apple', 'water', 'table', 'orange']
fruits = filter(get_fruit, items)
print(list(fruits)) ## Returns ['apple', 'orange']
map(): Apply a Function to Each Item of an Iterable
  • To apply the given function to each item of a given iterable, use map.
nums = [1, 2, 3]
list(map(str, nums))             ## Returns ['1', '2', '3']

multiply_by_two = lambda num: num * 2
list(map(multiply_by_two, nums)) ## Returns [2, 4, 6]
sort(): Sort a List of Tuples by the First or Second Item
  • To sort a list of tuples by the first or second item in a tuple, use the sort() method. To specify which item to sort by, use the key parameter.
prices = [('apple', 3), ('orange', 1), ('grape', 3), ('banana', 2)]

## Sort by the first item
by_letter = lambda x: x[0]
prices.sort(key=by_letter)
prices ## Returns [('apple', 3), ('banana', 2), ('grape', 3), ('orange', 1)]

## Sort by the second item in reversed order
by_price = lambda x: x[1]
prices.sort(key=by_price, reverse=True)
prices ## Returns [('apple', 3), ('grape', 3), ('banana', 2), ('orange', 1)]

Tuple

slice: Make Your Indices More Readable by Naming Your Slice
  • Have you ever been confused when looking into code that contains hardcoded slice indices? Even if you understand it now, you might forget why you choose specific indices in the future.
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

some_sum = sum(data[:8]) * sum(data[8:])
  • If so, name your slice. Python provides a nice built-in function for that purpose called slice. By using names, your code is much easier to understand.
JANUARY = slice(0, 8)
FEBRUARY = slice(8, len(data))
some_sum = sum(data[JANUARY] * sum(data[FEBRUARY]))
print(some_sum) ## Prints 684

Dictionaries

Merge two dictionaries

  • Starting Python 3.5, you can use dictionary unpacking options:
{'a': 1, **{'b': 2}} ## Returns {'a': 1, 'b': 2}
{'a': 1, **{'a': 2}} ## Returns {'a': 2}
  • Note that if there are overlapping keys in the input dictionaries, the value in the last dictionary for the common key will be stored.

  • You can use this idea to merge two dictionaries:

d1 = {'a': 1}
d2 = {'b': 2}
{**d1, **d2} ## Returns {'a': 1, 'b': 2}
  • However, Python 3.9 or greater provides the simplest method to merge two dictionaries:
d1 = {'a': 1}
d2 = {'b': 2}
d3 = d1 | d2 ## Returns {'a': 1, 'b': 2}
  • To merge two dictionaries in Python 3.4 or lower:
d1 = {'a': 1}
d2 = {'b': 2}
d2.update(d1) ## Returns {'a': 1, 'b': 2}

max(dict)

  • Applying max on a Python dictionary will give you the largest key. To find the key with the largest value in a dictionary, utilize the key parameter (similar to sort) in the max method in conjunction with lambda functions or itemgetter.
from operator import itemgetter

birth_year = {"Ben": 1997, "Alex": 2000, "Oliver": 1995}
max(birth_year) ## Returns "Oliver"

max_val = max(birth_year, key=lambda k: birth_year[k])
max_val         ## Returns "Alex"

max_val = max(birth_year.items(), key=itemgetter(1))
max_val         ## Returns ('Alex', 2000)
max_val[0]      ## Returns "Alex"

dict.get(): Get the Default Value of a Dictionary if a Key Doesn’t Exist

dict.fromkeys()

  • To create a dictionary from a list and a value, use dict.fromkeys(). For instance, we can use dict.fromkeys() to create a dictionary of furnitures’ locations:
furnitures = ['bed', 'table', 'chair']
loc1 = 'IKEA'

furniture_loc = dict.fromkeys(furnitures, loc1)
furniture_loc ## Returns {'bed': 'IKEA', 'table': 'IKEA', 'chair': 'IKEA'}

… or create a dictionary of food’s locations:

food = ['apple', 'pepper', 'onion']
loc2 = 'ALDI'

food_loc = dict.fromkeys(food, loc2)
food_loc ## Returns {'apple': 'ALDI', 'pepper': 'ALDI', 'onion': 'ALDI'}
  • These results can be combined into a location dictionary like below:
locations = {**food_loc, **furniture_loc}
locations
{'apple': 'ALDI',
 'pepper': 'ALDI',
 'onion': 'ALDI',
 'bed': 'IKEA',
 'table': 'IKEA',
 'chair': 'IKEA'}

Function

**kwargs: Pass Multiple Arguments to a Function

  • Sometimes you might not know the arguments you will pass to a function. If so, use **kwargs.

  • **kwargs allow you to pass multiple arguments to a function using a dictionary. In the example below, passing **{'a':1, 'b':2} to the function is similar to passing a=1, b=1 to the function.

  • Once **kwargs argument is passed, you can treat it like a Python dictionary.

parameters = {'a': 1, 'b': 2}

def example(c, **kwargs):
    print(kwargs)
    for val in kwargs.values():
        print(c + val)

example(c=3, **parameters) 
## Prints 
## {'a': 1, 'b': 2}
## 4
## 5

Decorator in Python

  • Do you want to add the same block of code to different functions in Python? If so, use a decorator!

  • In the code below, the decorator tracks the time of the function say_hello:

import time 

def time_func(func):
    def wrapper():
        print("This happens before the function is called")
        start = time.time()
        func()
        print('This happens after the funciton is called')
        end = time.time()
        print('The duration is', end - start, 's')

    return wrapper
  • Now all I need to do is to add @time_func before the function say_hello.
@time_func
def say_hello():
    print("hello")

say_hello()
- which outputs:
    
```
This happens before the function is called
hello
This happens after the function is called
The duration is 0.0002987384796142578 s
```
  • Decorator makes the code clean and shortens repetitive code. If I want to track the time of another function, for example, func2(), I can just use:
@time_func
def func2():
    pass
func2()
- which outputs:
    
```
This happens before the function is called
This happens after the funciton is called
The duration is 4.38690185546875e-05 s
from typing import List, Dict
```

Classes

Abstract Classes: Declare Methods without Implementation

  • Sometimes you might want different classes to use the same attributes and methods. But the implementation of those methods can be slightly different in each class.

  • A good way to implement this is to use abstract classes. An abstract class contains one or more abstract methods.

  • An abstract method is a method that is declared but contains no implementation. The abstract method requires subclasses to provide implementations.

from abc import ABC, abstractmethod 

class Animal(ABC):

    def __init__(self, name: str):
        self.name = name 
        super().__init__()

    @abstractmethod 
    def make_sound(self):
        pass 

class Dog(Animal):
    def make_sound(self):
        print(f'{self.name} says: Woof')

class Cat(Animal):
    def make_sound(self):
        print(f'{self.name} says: Meows')

Dog('Pepper').make_sound()
Cat('Bella').make_sound()
## Prints 
## "Pepper says: Woof
## Bella says: Meows"

classmethod: What is it and When to Use it

  • When working with a Python class, To create a method that returns that class with new attributes, use classmethod.

  • Classmethod doesn’t depend on the creation of a class instance. In the code below, classmethod instantiates a new object whose attribute is a list of even numbers.

class Solver:
    def __init__(self, nums: list):
        self.nums = nums
    
    @classmethod
    def get_even(cls, nums: list):
        return cls([num for num in nums if num % 2 == 0])
    
    def print_output(self):
        print("Result:", self.nums)

## Not using class method       
nums = [1, 2, 3, 4, 5, 6, 7]
solver = Solver(nums).print_output()
## Prints Result: [1, 2, 3, 4, 5, 6, 7]

solver2 = Solver.get_even(nums)
solver2.print_output()
## Prints Result: [2, 4, 6]

getattr: a Better Way to Get the Attribute of a Class

  • To get a default value when calling an attribute that is not in a class, use getattr() method.

  • The getattr(class, attribute_name) method simply gets the value of an attribute of a class. However, if the attribute is not found in a class, it returns the default value provided to the function.

class Food:
    def __init__(self, name: str, color: str):
        self.name = name
        self.color = color

apple = Food("apple", "red")

print("The color of apple is", getattr(apple, "color", "yellow"))
## Prints "The color of apple is red"

print("The flavor of apple is", getattr(apple, "flavor", "sweet"))
## Prints "The flavor of apple is sweet"

print("The flavor of apple is", apple.sweet)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_337430/3178150741.py in <module>
----> 1 print("The flavor of apple is", apple.sweet)

AttributeError: 'Food' object has no attribute 'sweet'

__call__: Call your Class Instance like a Function

  • To call your class instance like a function, add the __call__() method to your class.
class DataLoader:
    def __init__(self, data_dir: str):
        self.data_dir = data_dir
        print("Instance is created")

    def __call__(self):
        print("Instance is called")

data_loader = DataLoader("my_data_dir")
## Instance is created

data_loader()
## Instance is called

Instance is created
Instance is called

@staticmethod: use the function without adding the attributes required for a new instance

  • Have you ever had a function in your class that doesn’t access any properties of a class but fits well in a class? You might find it redundant to instantiate the class to use that function. That is when you can turn your function into a static method.

All you need to turn your function into a static method is the decorator @staticmethod. Now you can use the function without adding the attributes required for a new instance.

import re

class ProcessText:
    def __init__(self, text_column: str):
        self.text_column = text_column

    @staticmethod
    def remove_URL(sample: str) -> str:
        """Replace url with empty space"""
        return re.sub(r"http\S+", "", sample)

text = ProcessText.remove_URL("My favorite page is https://www.google.com")
print(text) ## Prints "My favorite page is "

Property Decorator: A Pythonic Way to Use Getters and Setters

  • If you want users to use the right data type for a class attribute or prevent them from changing that attribute, use the property decorator.

  • In the code below, the first color method is used to get the attribute color and the second color method is used to set the value for the attribute color.

class Fruit:
    def __init__(self, name: str, color: str):
        self._name = name
        self._color = color

    @property
    def color(self):
        print("The color of the fruit is:")
        return self._color

    @color.setter
    def color(self, value):
        print("Setting value of color...")
        if self._color is None:
            if not isinstance(value, str):
                raise ValueError("color must be of type string")
            self.color = value
        else:
            raise AttributeError("Sorry, you cannot change a fruit's color!")

fruit = Fruit("apple", "red")
fruit.color

## Prints The color of the fruit is:
#'red'

fruit.color = "yellow"
Setting value of color...

## ---------------------------------------------------------------------------
## AttributeError                            Traceback (most recent call last)
## /tmp/ipykernel_337430/2513783301.py in <module>
## ----> 1 fruit.color = "yellow"
## 
## /tmp/ipykernel_337430/2891187161.py in color(self, value)
##      17             self.color = value
##      18         else:
## ---> 19             raise AttributeError("Sorry, you cannot change a fruit's color!")
##      20 
##      21 
## AttributeError: Sorry, you cannot change a fruit's color!

__str__ and __repr__: Create a String Representation of a Python Object¶

  • To create a string representation of an object, add __str__ and __repr__.

  • __str__ shows readable outputs when printing the object. __repr__ shows outputs that are useful for displaying and debugging the object.

class Food:
    def __init__(self, name: str, color: str):
        self.name = name
        self.color = color

    def __str__(self):
        return f"{self.color} {self.name}"

    def __repr__(self):
        return f"Food({self.color}, {self.name})"

food = Food("apple", "red")

## Invokes __str__()
print(food) ## Prints "red apple"

## Invokes __repr__()
food ## Prints Food(red, apple)

attrs: Bring Back the Joy of Writing Classes!

  • Do you find it annoying to write an __init__() method every time you want to create a class in Python?
class Dog:
    def __init__(self, age: int, name: str, type_: str = 'Labrador Retriever'):
        self.age = age 
        self.name = name
        self.type_ = type_
        
    def describe(self):
        print(f"{self.name} is a {self.type_}.")
  • If so, try attrs. With attrs, you can declaratively define the attributes of a class.
import attr

@attr.s(auto_attribs=True)
class Dog:
    age: int
    name: str
    type_: str = "Labrador Retriever"

    def describe(self):
        print(f"{self.name} is a {self.type_}.")

pepper = Dog(7, "Pepper", "Labrador Retriever")
  • The instance created using attrs has a nice human-readable __repr__().
pepper ## Returns Dog(age=7, name='Pepper', type_='Labrador Retriever')
pepper.describe() Pepper is a Labrador Retriever.
  • You can also turn the attributes of that instance into a dictionary.
attr.asdict(pepper)
{'age': 7, 'name': 'Pepper', 'type_': 'Labrador Retriever'}
  • You can also compare two instances of the same class using the first attribute of that class.
bim = Dog(8, 'Bim Bim', 'Dachshund')

pepper < bim ## Returns True
  • Find other benefits of attrs here.

Datetime

datetime + timedelta: Calculate End DateTime Based on Start DateTime and Duration

  • Provided an event starts at a certain time and takes a certain number of minutes to finish, how do you determine when it ends?

  • Taking the sum of datetime and timedelta (minutes) does the trick!

from datetime import date, datetime, timedelta

beginning = '2020/01/03 23:59:00'
duration_in_minutes = 2500

## Find the beginning time
beginning = datetime.strptime(beginning, '%Y/%m/%d %H:%M:%S')

## Find duration in days
days = timedelta(minutes=duration_in_minutes)

## Find end time
end = beginning + days 
end ## Returns datetime.datetime(2020, 1, 5, 17, 39)

Use Dates in a Month as the Feature

  • Have you ever wanted to use dates in a month as the feature in your time series data? You can find the days in a month by using calendar.monthrange(year, month)[1] like below.
import calendar 

calendar.monthrange(2020, 11)[1] ## Returns 30

Best Practices

  • This section includes some best practices to write Python code.

Use _ to Ignore Values

  • When assigning the values returned from a function, you might want to ignore some values that are not used in future code. If so, assign those values to underscores _.
def return_two():
    return 1, 2

_, var = return_two()
var ## Returns 2
  • If you want to repeat a loop a specific number of times but don’t care about the index, you can also use _.
for _ in range(5):
    print('Hello')
## Prints 
## Hello
## Hello
## Hello
## Hello
## Hello

Python Pass Statement

  • If you want to create code that does a particular thing but don’t know how to write that code yet, put that code in a function then use pass.

  • Once you have finished writing the code in a high level, start to go back to the functions and replace pass with the code for that function. This will prevent your thoughts from being disrupted.

def say_hello():
    pass 

def ask_to_sign_in():
    pass 

def main(is_user: bool):
    if is_user:
        say_hello()
    else:
        ask_to_sign_in()

main(is_user=True)

Code Speed

  • This section will show you some ways to speed up or track the performance of your Python code.

Concurrently Execute Tasks on Separate CPUs

  • If you want to concurrently execute tasks on separate CPUs to run faster, consider using joblib.Parallel. It allows you to easily execute several tasks at once, with each task using its own processor.
from joblib import Parallel, delayed
import multiprocessing

def add_three(num: int):
    return num + 3

num_cores = multiprocessing.cpu_count()
results = Parallel(n_jobs=num_cores)(delayed(add_three)(i) for i in range(10))
results ## Returns [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Compare The Execution Time Between Two Functions

  • If you want to compare the execution time between two functions, try timeit.timeit(). You can also specify the number of times you want to rerun your function to get a better estimation of the time.
import time 
import timeit 

def func():
    """comprehension"""
    l = [i for i in range(10_000)]

def func2():
    """list range"""
    l = list(range(10_000))

expSize = 1000
time1 = timeit.timeit(func, number=expSize)
time2 = timeit.timeit(func2, number=expSize)

print(time1/time2) ## Prints 2.6299518653018685
  • From the result, we can see that it is faster to use list range than to use list comprehension on average.

Python Built-in Libraries

  • This section covers Python Built-in libraries such as collections, functools, and itertools.

Collections

  • collections is a built-in Python library to deal with Python dictionary efficiently. This section will show you some useful methods of this module.

collections.Counter: Count The Occurrences of Items in a List

  • Counting the occurrences of each item in a list using a for-loop is slow and inefficient.
char_list = ['a', 'b', 'c', 'a', 'd', 'b', 'b']
def custom_counter(list_: list):
    char_counter = {}
    for char in list_:
        if char not in char_counter:
            char_counter[char] = 1
        else: 
            char_counter[char] += 1

    return char_counter
custom_counter(char_list) ## Returns {'a': 2, 'b': 3, 'c': 1, 'd': 1}
  • Using collections.Counter is more efficient, and all it takes is one line of code!
from collections import Counter

Counter(char_list) ## Returns Counter({'a': 2, 'b': 3, 'c': 1, 'd': 1})
  • In my experiment, using Counter is >2x times faster than using a custom counter.
from timeit import timeit
import random 

random.seed(0)
num_list = [random.randint(0, 22) for _ in range(1000)]

numExp = 100
custom_time = timeit("custom_counter(num_list)", globals=globals())
counter_time = timeit("Counter(num_list)", globals=globals())
print(custom_time/counter_time) ## Returns 2.6199148843686806

namedtuple: Tuple with Named Fields

  • If you need to create creating a tuple with named fields, consider using namedtuple:
Point = namedtuple('Point', ['x', 'y'])
p = Point(11, y=22) ## Instantiate with positional or keyword arguments
p[0] + p[1]         ## Returns 33; indexable like the plain tuple (11, 22)
x, y = p            ## Unpack like a regular tuple
x, y                ## Returns (11, 22)
p.x + p.y           ## Returns 33; Fields also accessible by name
p                   ## Returns Point(x=11, y=22); readable __repr__ with a name=value style

Defaultdict: Return a Default Value When a Key is Not Available

  • If you want to create a Python dictionary with default value, use defaultdict. When calling a key that is not in the dictionary, the default value is returned.
from collections import defaultdict

classes = defaultdict(lambda: 'Outside')
classes['Math'] = 'B23'
classes['Physics'] = 'D24'
classes['Math'] ## Returns 'B23'
classes['English'] ## Returns 'Outside'
  • Note that the first argument to defaultdict which is default_factory, requires a callable, which implies either a class or a function.

  • You could also achieve similar functionality using dict.get()](), however note that this requires specifying the default value at every fetch-item call rather than once when defining the dictionary.

classes = {}
classes.get("English", "Outside") ## Returns 'Outside'

Itertools

  • itertools[https://docs.python.org/3/library/itertools.html] is a built-in Python library that creates iterators for efficient looping. This section will show you some useful methods of itertools.

3.2.1. itertools.combinations: A Better Way to Iterate Through a Pair of Values in a Python List¶ If you want to iterate through a pair of values in a list and the order does not matter ((a,b) is the same as (b, a)), a naive approach is to use two for-loops.

num_list = [1, 2, 3] for i in num_list: for j in num_list: if i < j: print((i, j)) (1, 2) (1, 3) (2, 3) However, using two for-loops is lengthy and inefficient. Use itertools.combinations instead:

from itertools import combinations

comb = combinations(num_list, 2) ## use this for pair in list(comb): print(pair) (1, 2) (1, 3) (2, 3) 3.2.2. itertools.product: Nested For-Loops in a Generator Expression¶ Are you using nested for-loops to experiment with different combinations of parameters? If so, use itertools.product instead.

itertools.product is more efficient than nested loop because product(A, B) returns the same as ((x,y) for x in A for y in B).

from itertools import product

params = { “learning_rate”: [1e-1, 1e-2, 1e-3], “batch_size”: [16, 32, 64], }

for vals in product(*params.values()): combination = dict(zip(params.keys(), vals)) print(combination) {‘learning_rate’: 0.1, ‘batch_size’: 16} {‘learning_rate’: 0.1, ‘batch_size’: 32} {‘learning_rate’: 0.1, ‘batch_size’: 64} {‘learning_rate’: 0.01, ‘batch_size’: 16} {‘learning_rate’: 0.01, ‘batch_size’: 32} {‘learning_rate’: 0.01, ‘batch_size’: 64} {‘learning_rate’: 0.001, ‘batch_size’: 16} {‘learning_rate’: 0.001, ‘batch_size’: 32} {‘learning_rate’: 0.001, ‘batch_size’: 64} 3.2.3. itertools.starmap: Apply a Function With More Than 2 Arguments to Elements in a List¶ map is a useful method that allows you to apply a function to elements in a list. However, it can’t apply a function with more than one argument to a list.

def multiply(x: float, y: float): return x * y nums = [(1, 2), (4, 2), (2, 5)] list(map(multiply, nums)) ————————————————————————— TypeError Traceback (most recent call last) /tmp/ipykernel_38110/240000324.py in 1 nums = [(1, 2), (4, 2), (2, 5)] ----> 2 list(map(multiply, nums))

TypeError: multiply() missing 1 required positional argument: ‘y’ To apply a function with more than 2 arguments to elements in a list, use itertools.starmap. With starmap, elements in each tuple of the list nums are used as arguments for the function multiply.

from itertools import starmap

list(starmap(multiply, nums)) [2, 8, 10] 3.2.4. itertools.compress: Filter a List Using Booleans¶ Normally, you cannot filter a list using a list.

fruits = [‘apple’, ‘orange’, ‘banana’, ‘grape’, ‘lemon’] chosen = [1, 0, 0, 1, 1] fruits[chosen] ————————————————————————— TypeError Traceback (most recent call last) /tmp/ipykernel_40588/2755098589.py in 1 fruits = ['apple', 'orange', 'banana', 'grape', 'lemon'] 2 chosen = [1, 0, 0, 1, 1] ----> 3 fruits[chosen]

TypeError: list indices must be integers or slices, not list To filter a list using a list of booleans, use itertools.compress instead

from itertools import compress

list(compress(fruits, chosen)) [‘apple’, ‘grape’, ‘lemon’] 3.2.5. itertools.groupby: Group Elements in an Iterable by a Key¶ If you want to group elements in a list by a key, use itertools.groupby. In the example below, I grouped elements in the list by the first element in each tuple.

from itertools import groupby

prices = [(‘apple’, 3), (‘orange’, 2), (‘apple’, 4), (‘orange’, 1), (‘grape’, 3)]

key_func = lambda x: x[0]

Sort the elements in the list by the key

prices.sort(key=key_func)

Group elements in the list by the key

for key, group in groupby(prices, key_func): print(key, ‘:’, list(group)) apple : [(‘apple’, 3), (‘apple’, 4)] grape : [(‘grape’, 3)] orange : [(‘orange’, 2), (‘orange’, 1)] 3.2.6. itertools.zip_longest: Zip Iterables of Different Lengths¶ zip allows you to aggregate elements from each of the iterables. However, zip doesn’t show all pairs of elements when iterables have different lengths.

fruits = [‘apple’, ‘orange’, ‘grape’] prices = [1, 2] list(zip(fruits, prices)) [(‘apple’, 1), (‘orange’, 2)] To aggregate iterables of different lengths, use itertools.zip_longest. This method will fill missing values with fillvalue.

from itertools import zip_longest list(zip_longest(fruits, prices, fillvalue=’-‘)) [(‘apple’, 1), (‘orange’, 2), (‘grape’, ‘-‘)]

References

Citation

If you found our work useful, please cite it as:

@article{Chadha2020DistilledPython3Tips,
  title   = {Python 3 Tips},
  author  = {Chadha, Aman},
  journal = {Distilled AI},
  year    = {2020},
  note    = {\url{https://aman.ai}}
}