Pandas: How to get started.

A man getting ready for a race.
A man getting ready for a race.
Photo by Serghei Trofimov on Unsplash

Pandas library in python provides a lot of features that are useful not only in data science projects but also for quick data manipulations or conversions.

Here I will list a few things that I used in my first Pandas project.

First and foremost you would need to import pandas into your script.

import pandas as pd

Next you can load a data set saved as a comma separated file (.csv) using

df = pd.read_csv(‘datafile.csv’)

I prefer CSV format as it is the vanilla format and if need arises you use a simple text editor to peruse it.

Pandas uses data frame to store the data in memory, similar to R.

In this blog df will be used to denote the variable is a data frame.

Displays stats of the data.

matdf.describe()

Prints all the variables in the data frame

df.columns.values

Displays the first and the last ’n’ rows in a data frame respectively.
’n’ equals 5 by default.

matdf.head(n)<br>
matdf.tail(n)

Concatenates 2 data frames along the required axis.
When axis equals 0, more observations are added to the resulting data frame.
When axis equals 1, more variables (dimensions) are added to the resulting data frame.
A good explanation can be found here.

df = pd.concat([df1, df2], axis=1)

To display the unique values in a column.
Good to see different values of an ordinal variable.

df.columnName.unique()

The Simplest way to create a data frame

pd.DataFrame([[1.0,2.0],[3.0,4.0]], columns=[‘a’, ‘b’])

To delete a column

df.drop(‘columnName’, axis = 1)

Allows you to label rows using an already present column.

matdf = df.set_index(‘columnName’)

loc command allows you to select rows (indices) from column2 and substitute them with value2, that equals a condition to the values in column1.
Note: columnName1 and columnName2 can be the same column name.

df.loc[df[‘columnName1’] == value1,’columnName2'] = value2

Get stats from a column for only those rows (indices) where the conditions are satisfied in one or more columns.

df.columnName3[(df.columnName1 == value1) &             (df.columnName2 == value2 )].mean()

After you preprocess the data you might want to save it to the disk.
to_csv function comes in handy to save a data frame as a CSV file,

matdf.to_csv(‘matYearCountry.csv’)

That's all, go fire up your first pandas project.

Computer Vision | Machine Learning | Deep Learning https://arccoder.github.io/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store