Corporate Training
Request Demo
Click me
Menu
Let's Talk
Request Demo

Tutorials

Introduction to Data Science Libraries (NumPy, Pandas)

Introduction to Data Science Libraries (NumPy, Pandas)

NumPy

1. What is NumPy?

  • NumPy is a fundamental package for numerical computations in Python.
  • It provides support for large, multi-dimensional arrays and matrices.

2. Array Creation:

  • Create arrays using numpy.array().
  • Generate arrays using functions like numpy.arange(), numpy.zeros(), and numbness().

3. Array Operations:

  • Perform element-wise operations on arrays.
  • Use functions like numpy.add(), numpy.subtract(), etc.

4. Array Indexing and Slicing:

  • Access elements and subsets of arrays using indexing and slicing.
  • Indexing starts from 0 and can be negative (counting from the end).

5. Broadcasting:

  • NumPy allows operations on arrays of different shapes, using broadcasting rules.

6. Mathematical and Statistical Functions:

  • NumPy offers a wide range of mathematical and statistical functions.
  • Perform operations like mean, median, standard deviation, etc.

Pandas:

1. What is Pandas?

  • Pandas is a library for data manipulation and analysis.
  • It introduces two primary data structures: Series (1D) and DataFrame (2D).

2. Series and DataFrame:

  • Series is a one-dimensional labeled array-like object.
  • DataFrame is a two-dimensional tabular data structure with labeled axes.

3. Data Input and Output:

  • Read and write data from/to various formats: CSV, Excel, SQL, etc.
  • Use functions like pandas.read_csv() and DataFrame.to_csv().

4. Data Exploration:

  • Use methods like DataFrame.head(), DataFrame.info(), and DataFrame.describe() to explore data.

5. Data Selection and Filtering:

  • Select columns and rows using labels or indices.
  • Apply conditions to filter data.

6. Data Manipulation:

  • Perform operations on columns, apply functions to rows, and create new columns.
  • Use methods like DataFrame.groupby() for grouping and aggregation.

Example:

Here's a simple example illustrating the usage of NumPy and pandas:

import numpy as np
import pandas as pd

# NumPy example
arr = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", arr)
print("Mean:", np.mean(arr))

# Pandas example
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22]}
df = pd.DataFrame(data)
print("\nDataFrame:")
print(df)
       

 

 

In this example, we create a NumPy array and compute its mean. Then, we create a pandas DataFrame from a dictionary and print its contents.

NumPy and pandas are powerful libraries that form the foundation of data science workflows. They provide efficient tools for data manipulation, transformation, and analysis, making them indispensable for any data-related project.