Pandas Basics 1
Pandas is an open source library that allows you to explore, analyze and manipulate data. One main use of pandas is to transform our data for easier use with machine learning algorithms.
# Importing pandas
import pandas as pd
Pandas has two datatypes: Series and DataFrame
- Series is a 1-d column of data
- Dataframe is a 2-d data table with columns and rows
# Create a Series of countries
country = pd.Series(['Togo', 'Japan', 'Chile', 'Canada'])
# Create a Series of continents
continent = pd.Series(['Africa', 'Asia','South America', 'North America'])
# Create a Dataframe from the countries and continents
country_data = pd.DataFrame({'country':country,
'continent':continent})
A pandas Dataframe of the countries and continents will look like this:
country continent
0 Togo Africa
1 Japan Asia
2 Chile South America
3 Canada North America
Importing data
Pandas can be used to import excel and csv files.
# Importing csv data eg. importing the Boston housing train data
boston_housing = pd.read_csv('../pandas/train.csv')
# To read in an excel file, use pd.read_excel
The Boston housing training data can be found here.
Pandas allows to export data to a csv form using .to_csv
or to excel using .to_excel
.
This is the end of Pandas Basics Part 1. In Part 2, we will dive into exploring and describing data.