Pandas Basics 1

·

2 min read

Pandas is an open source library that allows you to explore, analyze and manipulate data. One main use of pandas is to transform our data for easier use with machine learning algorithms.

# Importing pandas
import pandas as pd

Pandas has two datatypes: Series and DataFrame

  • Series is a 1-d column of data
  • Dataframe is a 2-d data table with columns and rows
# Create a Series of countries
country = pd.Series(['Togo', 'Japan', 'Chile', 'Canada'])
# Create a Series of continents
continent = pd.Series(['Africa', 'Asia','South America', 'North America'])
# Create a Dataframe from the countries and continents
country_data = pd.DataFrame({'country':country,
                             'continent':continent})

A pandas Dataframe of the countries and continents will look like this:

country      continent
0    Togo         Africa
1   Japan           Asia
2   Chile  South America
3  Canada  North America

Importing data

Pandas can be used to import excel and csv files.

# Importing csv data eg. importing the Boston housing train data
boston_housing = pd.read_csv('../pandas/train.csv')

# To read in an excel file, use pd.read_excel

The Boston housing training data can be found here.

Pandas allows to export data to a csv form using .to_csv or to excel using .to_excel.

This is the end of Pandas Basics Part 1. In Part 2, we will dive into exploring and describing data.