Split Pandas DataFrame by Rows in Python

Pandas is a powerful and open-source Python library that is used for manipulating data and is useful in performing data analysis tasks; pandas provide data structures and functions that are very helpful in performing data analysis tasks. Pandas is built on top of the NumPy library, which is well-suited for working with tabular data, such as spreadsheets or SQL tables. The Pandas library is versatile and easy to use, which makes it a powerful tool for data analysis. Data scientists use Pandas to work with structured data in Python.

Pandas are used in conjunction with other libraries that are used for data science. Pandas is built on top of the NumPy library, which means that a lot of functions are taken from NumPy. The data generated by Pandas are used to plot the function of Matplotlib, perform statistical analysis in SciPy, and use the machine learning algorithm in Scikit-learn. The function of Pandas Library:

Data cleaning, data munging and joining.
Missing value imputation is easy in floating point as well as non-floating-point data.
Columns can be inserted and deleted from DataFrame.

Visualization of data.

The pd represents an alias for the Pandas. It is not necessary to use the alias; this alias just helps in writing less code every time and can be used to write the code cleanly.

There are two types of data structures provided by pandas:

Series
DataFrame

Pandas Series:

Pandas series is a one-dimensional array that is used to hold data of any type (integer, string, float, Python objects, etc.). The axis labels are called as indexes. Pandas series are a type of column in an Excel Sheet. The labels in the Pandas series must not be unique but must be a hashable type. Let's see how to create a series in Pandas. The series can be created with the help of lists, dictionaries, scaler values, etc.

import pandas as pd 
import numpy as np
 
# Creating empty series 
ser = pd.Series() 
print("Pandas Series: ", ser) 
 
# simple array 
data = np.array(['p', 'a', 'n', 'd', 'a']) 
   
ser = pd.Series(data) 
print("Pandas Series:\n", ser)

Output:

Pandas Series:  Series([], dtype: float64)
Pandas Series:
 0    p
1    a
2    n
3    d
4    a
dtype: object

In the above code, the panda's library is imported as pd, and the NumPy library is imported as np. A series is created with the help of the Series method provided by the Pandas, and a numpy array of characters is created with the help of the array() method in numpy; the array values are passed in the series() method in pandas, and the series is printed.

Let's see how to create a data frame in Pandas:

DataFrame is like tables in which the values are stored in the form of rows and columns. DataFrame can be created by loading datasets from SQL databases, CSV files, or Excel files. Pandas dataframe can also be created from lists, dictionaries and from a list of dictionaries etc.

Example:

import pandas as pd 
   
# Calling DataFrame constructor 
df = pd.DataFrame() 
print(df)
 
# list of strings 
lst = ['Data', 'Frame', 'in', 'Pandas'] 
   
# Calling DataFrame constructor on the list 
df = pd.DataFrame(lst) 
print(df)

Output:

Empty DataFrame
Columns: []
Index: []
        0
0    Data
1   Frame
2      in
3  Pandas

Explanation:

In the above code, the panda's module is imported, a DataFrame constructor is made, a list is created, the list is passed to the DataFrame constructor, and the data is printed.

Many a time, an import error occurs when you try to import the pandas. This happens due to improper installation of the panda's library, and the panda's library is not installed.

Let's see how to split Pandas DataFrame:

Example:

import pandas as pd

technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
}

df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

df1 = df.iloc[:2,:]
df2 = df.iloc[2:,:]

# Example 2: Split the DataFrame using iloc[] by columns
df1 = df.iloc[:,:2]
df2 = df.iloc[:,2:]

# Example 3: Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(df.Duration)
df1 = grouped.get_group("35days")

# Example 4: split DataFrame using sample()
df1 = df.sample(frac = 0.5, random_state = 200)

Output:

Create DataFrame:
    Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
2   Hadoop  23000      1000   40days
3   Python  24000      1200 30 days
4   Pandas  26000      2500   25days

Explanation:

In the above code, the Pandas module is imported, and a data frame is created with the help of a dictionary. The column values are split using the local method provided by Pandas.

Split Pandas Dataframe by rows using iloc[] split function:

The iloc attribute provided by Python helps in splitting the dataframe by rows. The iloc is used to get rows and columns by position or index.

Splitting Dataframe by Row:

This method is used to get the specific portion based on rows from the DataFrame. Let's see how to split the data frame.

Code:

import pandas as pd
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
}

df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

df1 = df.iloc[:2,:]
df2 = df.iloc[2:,:]

Output:

Create DataFrame:
    Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
2   Hadoop  23000      1000   40days
3   Python  24000      1200 30 days
4   Pandas  26000      2500   25days
=========================
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
=========================
  Courses    Fee  Discount Duration
2  Hadoop  23000      1000   40days
3  Python  24000      1200 30 days
4  Pandas  26000      2500   25days

Explanation:

In the above code, the Pandas module is imported and a data frame is create with dictionary data. With the help of the local method, the data frame is split by rows.

Split Dataframe by Columns:

The data frames can be split into columns with the help of the local method based on rows. Let's see how to split the data frame by columns.

Code:

import pandas as pd

technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
}

df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
print("=========================")

df1 = df.iloc[:,:2]
print(df1)
print("=====================")
df2 = df.iloc[:,2:]
print("=====================")

Output:

Create DataFrame:
    Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days
2   Hadoop  23000      1000   40days
3   Python  24000      1200   30days
4   Pandas  26000      2500   25days
=========================
   Courses    Fee
0    Spark  22000
1  PySpark  25000
2   Hadoop  23000
3   Python  24000
4   Pandas  26000
=====================
   Discount Duration
0      1000   35days
1      2300   35days
2      1000   40days
3      1200   30days
4      2500   25days
=====================

Explanation:

In the above code, the Pandas module is imported, and with the help of a dictionary, a dataframe is created. The data is split into columns based on rows.

Conclusion:

Splitting rows in Pandas is very important in the context of data analysis. There are various methods by which the Pandas DataFrames can be split into rows in Python.

Next TopicSql using python

← prev next →