Get Data¶
This section covers tools to get some data for your projects.
faker: Create Fake Data in One Line of Code¶
!pip install Faker
To quickly create fake data for testing, use faker.
from faker import Faker
fake = Faker()
fake.color_name()
'CornflowerBlue'
fake.name()
'Michael Scott'
fake.address()
'881 Patricia Crossing\nSouth Jeremy, AR 06087'
fake.date_of_birth(minimum_age=22)
datetime.date(1927, 11, 5)
fake.city()
'North Donald'
fake.job()
'Teacher, secondary school'
fetch_openml: Get OpenML’s Dataset in One Line of Code¶
OpenML has many interesting datasets. The easiest way to get OpenML’s data in Python is to use the sklearn.datasets.fetch_openml
method.
In one line of code, you get the OpenML’s dataset to play with!
from sklearn.datasets import fetch_openml
monk = fetch_openml(name="monks-problems-2", as_frame=True)
print(monk["data"].head(10))
attr1 attr2 attr3 attr4 attr5 attr6
0 1 1 1 1 2 2
1 1 1 1 1 4 1
2 1 1 1 2 1 1
3 1 1 1 2 1 2
4 1 1 1 2 2 1
5 1 1 1 2 3 1
6 1 1 1 2 4 1
7 1 1 1 3 2 1
8 1 1 1 3 4 1
9 1 1 2 1 1 1
Autoscraper¶
!pip install autoscraper
If you want to get the data from some websites, Beautifulsoup makes it easy for you to do so. But can scraping be automated even more? If you are looking for a faster way to scrape some complicated websites such as Stackoverflow, Github in a few lines of codes, try autoscraper.
All you need is to give it some texts so it can recognize the rule, and it will take care of the rest for you!
from autoscraper import AutoScraper
url = "https://stackoverflow.com/questions/2081586/web-scraping-with-python"
wanted_list = ["How to check version of python modules?"]
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
for res in result:
print(res)
How to execute a program or call a system command?
What are metaclasses in Python?
Does Python have a ternary conditional operator?
Convert bytes to a string
Does Python have a string 'contains' substring method?
How to check version of python modules?
pandas-reader: Extract Data from Various Internet Sources Directly into a Pandas DataFrame¶
!pip install pandas-datareader
Have you wanted to extract series data from various Internet sources directly into a pandas DataFrame? That is when pandas_reader comes in handy.
Below is the snippet to extract daily data of AD indicator from 2008 to 2018.
import os
from datetime import datetime
import pandas_datareader.data as web
df = web.DataReader(
"AD",
"av-daily",
start=datetime(2008, 1, 1),
end=datetime(2018, 2, 28),
api_key=os.gehide-outputtenv("ALPHAVANTAGE_API_KEY"),
)