2.1. String

2.1.1. String find: Find The Index of a Substring in a Python String

If you want to find the index of a substring in a string, use find() method. This method will return the index of the first occurrence of the substring if found and return -1 otherwise.

sentence = "Today is Saturaday"

# Find the index of first occurrence of the substring
sentence.find("day")
2
sentence.find("nice")
# No substring is found
-1

You can also provide the starting and stopping position of the search:

# Start searching for the substring at index 3
sentence.find("day", 3)
15

2.1.2. re.sub: Replace One String with Another String Using Regular Expression

If you want to either replace one string with another string or to change the order of characters in a string, use re.sub.

re.sub allows you to use a regular expression to specify the pattern of the string you want to swap.

In the code below, I replace 3/7/2021 with Sunday and replace 3/7/2021 with 2021/3/7.

import re

text = "Today is 3/7/2021"
match_pattern = r"(\d+)/(\d+)/(\d+)"

re.sub(match_pattern, "Sunday", text)
'Today is Sunday'
re.sub(match_pattern, r"\3-\1-\2", text)
'Today is 2021-3-7'

2.1.3. difflib.SequenceMatcher: Detect The “Almost Similar” Articles

When analyzing articles, different articles can be almost similar but not 100% identical, maybe because of the grammar, or because of the change in two or three words (such as cross-posting). How can we detect the “almost similar” articles and drop one of them? That is when difflib.SequenceMatcher comes in handy.

from difflib import SequenceMatcher

text1 = 'I am Khuyen'
text2 = 'I am Khuen'
print(SequenceMatcher(a=text1, b=text2).ratio())
0.9523809523809523

2.1.4. difflib.get_close_matches: Get a List of he Best Matches for a Certain Word

If you want to get a list of the best matches for a certain word, use difflib.get_close_matches.

from difflib import get_close_matches

tools = ['pencil', 'pen', 'erasor', 'ink']
get_close_matches('pencel', tools)
['pencil', 'pen']

To get closer matches, increase the value of the argument cutoff (default 0.6).

get_close_matches('pencel', tools, cutoff=0.8)
['pencil']