Python Pandas for Beginners - A Complete Guide (Part 1)

This post series, Python Pandas for Beginner, will be the best starting point to learn pandas library for the beginner. You will learn some of the most important pandas features such as exploring, cleaning, transforming, visualizing data.

Pandas is an open-source library in Python. It is the most popular Python library that is used for data analysis today. The powerful machine learning and visualization tools, it provides you the high-performance tool to analyze a big data set.

In this post, we will go over the essential information about pandas, from installation to advantage. You should make yourself a cup of coffee, take your favorite biscuit. After that, enjoy and read this article slowly. Feel free to stop and resume later, don't overwhelm yourself with a lot of info in a short time. Just follow step by step carefully, the pandas will come to you.

Python Pandas for Beginners
Hello Pandas
(Panda - Source: Wallpaper Play)

What is Pandas?

Pandas is a library for analytics, data processing, and data science. It's a huge open-source project with 1,500+ contributors. Here is the link of project Pandas on GitHub

Installations

The easiest way to install Pandas is by using Anaconda distribution. You haven't installed Anaconda yet, read our post for Anaconda installation guide.

If you don't want to install Anaconda, you can install it via pip.

pip install pandas

Data Structure of Pandas

The two primary data structures of Pandas are Series and DataFrame. A Series is simply a column when we join multiple series (columns), so we have a DataFrame.

Python Pandas Data Structure
Series and DataFrame in Pandas

Creating Your Series And DataFrame

Getting Start With Series

Firstly, creating a series data by passing a list of values. Pandas will count index from 0 by default.

import numpy as np
import pandas as pd

data_series = pd.Series([1, 9, 3, np.nan, 8])
print(data_series)

/* Result:

0    1.0
1    9.0
2    3.0
3    NaN
4    8.0
dtype: float64

*/

Create DataFrame In The Easiest Way

To create a DataFrame, there are many ways in Python. However, the easiest way is to create a dict. After that, pass the dictionary data to the DataFrame constructor and it will do the job.

import pandas as pd

data = {
    'Paris': [3, 2, 0, 1], 
    'Berlin': [0, 3, 7, 2]
}
purchases = pd.DataFrame(data)
print(purchases)

/* Result:

     Paris    Berlin
0      3         0
1      2         3
2      0         7
3      1         2

*/

Create DataFrame With Numpy

Passing the Numpy array, datetime data as index and column labels to DataFrame constructor:

import numpy as np
import pandas as pd

dates = pd.date_range('20191001', periods=6)
dataframe = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print(dataframe)

/*
                 A         B         C         D
2019-10-01   0.304466 -0.699206 -2.090317  1.564566
2019-10-02  -0.876682  0.876720  1.275542 -0.757827
2019-10-03   0.029740 -1.282535 -0.420332 -1.176261
2019-10-04  -0.153740 -0.087788  1.314169 -1.835564
2019-10-05   0.301839  0.036301  0.138372  1.755769
2019-10-06   1.546020 -0.148291  0.781045 -1.789371
*/

In the example, we can see that index will represent row labels. In other way, column parameter is using for column labels.

References

We have used below documents for the reference while creating the series. If you love to use Pandas, may be you should read it.

Summary of Part 1

Through the first part of the series, Python Pandas for Beginners, you basically understand what is pandas and how to install it via pip or Anaconda. Furthermore, you can create your data Series or DataFrame.

In part 2, you will learn how to read the pandas data from JSON file, and some important operations of pandas.

See you in the next article, if you like this series, please share it for other Python Geeks. Leave a comment for us to help us improve in the next post.

Sharing is caring!

1 thought on “Python Pandas for Beginners – A Complete Guide (Part 1)

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *