6.6. Time Series¶
6.6.1. datefinder: Automatically Find Dates and Time in a Python String¶
!pip install datefinder
If you want to automatically find date and time with different formats in a Python string, try datefinder.
from datefinder import find_dates
text = """"We have one meeting on May 17th,
2021 at 9:00am and another meeting on 5/18/2021
at 10:00. I hope you can attend one of the
meetings."""
matches = find_dates(text)
for match in matches:
print("Date and time:", match)
print("Only day:", match.day)
Date and time: 2021-05-17 09:00:00
Only day: 17
Date and time: 2021-05-18 10:00:00
Only day: 18
6.6.2. Fastai’s add_datepart: Add Relevant DateTime Features in One Line of Code¶
!pip install fastai
When working with time series, other features such as year, month, week, day of the week, day of the year, whether it is the end of the year or not, can be really helpful to predict future events. Is there a way that you can get all of those features in one line of code?
Fastai’s add_datepart method allows you to do exactly that.
import pandas as pd
from fastai.tabular.core import add_datepart
from datetime import datetime
df = pd.DataFrame(
{
"date": [
datetime(2020, 2, 5),
datetime(2020, 2, 6),
datetime(2020, 2, 7),
datetime(2020, 2, 8),
],
"val": [1, 2, 3, 4],
}
)
df
date | val | |
---|---|---|
0 | 2020-02-05 | 1 |
1 | 2020-02-06 | 2 |
2 | 2020-02-07 | 3 |
3 | 2020-02-08 | 4 |
df = add_datepart(df, "date")
df.columns
Index(['val', 'Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start',
'Is_year_end', 'Is_year_start', 'Elapsed'],
dtype='object')
6.6.3. Maya: Convert the string to datetime automatically¶
!pip install maya
If you want to convert a string type to a datetime type, the common way is to use strptime(date_string, format). But it is quite inconvenient to to specify the structure of your datetime string, such as ‘ %Y-%m-%d %H:%M:%S’.
There is a tool that helps you convert the string to datetime automatically called maya. You just need to parse the string and maya will figure out the structure of your string.
import maya
# Automatically parse datetime string
string = "2016-12-16 18:23:45.423992+00:00"
maya.parse(string).datetime()
datetime.datetime(2016, 12, 16, 18, 23, 45, 423992, tzinfo=<UTC>)
Better yet, if you want to convert the string to a different time zone (for example, CST), you can parse that into maya’s datetime function.
maya.parse(string).datetime(to_timezone="US/Central")
datetime.datetime(2016, 12, 16, 12, 23, 45, 423992, tzinfo=<DstTzInfo 'US/Central' CST-1 day, 18:00:00 STD>)
Check out the doc for more ways of manipulating your date string faster here.
6.6.4. Extract holiday from date column¶
!pip install holidays
You have a date column and you think the holidays might affect the target of your data. Is there an easy way to extract the holidays from the date? Yes, that is when holidays package comes in handy.
Holidays package provides a dictionary of holidays for different countries. The code below is to confirm whether 2020-07-04 is a US holiday and extract the name of the holiday.
from datetime import date
import holidays
us_holidays = holidays.UnitedStates()
"2014-07-04" in us_holidays
True
The great thing about this package is that you can write the date in whatever way you want and the package is still able to detect which date you are talking about.
us_holidays.get("2014-7-4")
'Independence Day'
us_holidays.get("2014/7/4")
'Independence Day'
You can also add more holidays if you think that the library is lacking some holidays. Try this out if you are looking for something similar.
6.6.5. traces: A Python Library for Unevenly-Spaced Time Series Analysis¶
!pip install traces
If you are working with unevenly-spaced time series, try traces. traces allows you to get the values of the datetimes not specified in your time series based on the values of other datetimes.
For example, while logging our working hours for each date, we forgot to log the working hours for some dates.
# Log working hours for each date
import traces
from datetime import datetime
working_hours = traces.TimeSeries()
working_hours[datetime(2021, 9, 10)] = 10
working_hours[datetime(2021, 9, 12)] = 5
working_hours[datetime(2021, 9, 13)] = 6
working_hours[datetime(2021, 9, 16)] = 2
We can get the working hours of dates we forgot to log using traces.
# Get value on 2021/09/11
working_hours[datetime(2021, 9, 11)]
10
# Get value on 2021/09/14
working_hours[datetime(2021, 9, 14)]
6
We can also get the distribution of our working hours from 2021-9-10
to 2021-9-16
using distribution
:
distribution = working_hours.distribution(
start=datetime(2021, 9, 10),
end=datetime(2021, 9, 16)
)
distribution
Histogram({5: 0.16666666666666666, 6: 0.5, 10: 0.3333333333333333})
From the output above, it seems like we work 6 hours per day 50% of the time.
Get the median working hours:
distribution.median()
6.0
Get the mean working hours:
distribution.mean()
7.166666666666666