Page 68 - IPP-12-2025
P. 68

5.  It can easily select subsets of data from bulky datasets and can even combine datasets together.
             6.  It has the functionality to find and fill missing data.
             7.  It allows us to apply operations to independent groups within the data.
             8.  It supports reshaping of data into different forms.
             9.  It supports advanced time-series functionality, i.e., the use of a model to predict future values
                based on previously observed values.
            10.  It supports visualization by integrating libraries such as Matplotlib, Seaborn, etc. Pandas is best
                at handling huge tabular datasets comprising different data formats.

            CSV as Back-end
            Using CSV (Comma Separated Values) files as a back-end for data analysis and visualization in Python
            is an effective approach as they are widely used for storing tabular form of data across platforms.
            In Python, libraries like Pandas make it convenient to work with CSV files as they can be loaded and
            read using functions. Once loaded, the data can be manipulated accordingly. After processing the
            data, libraries such as Matplotlib can be imported and used to create visualizations of the data such
            as line graphs, bar graphs and histograms, enabling interpretation of data directly from CSV back-end.

            Features of CSV File
             1.  It is human readable and easy to edit manually.
             2.  It is simple to implement and parse.
             3.  It is processed by almost all existing applications.
             4.  It provides a straightforward information schema.
             5.  It is faster to handle.
             6.  It is smaller in size.
             7.  It is considered to be a standard format.

            Python Modules and Built-in Functions
            Modules
             1.  Pandas:  Pandas  is  a  Python  library  that  provides  data  structures  and  functions  for  data
                manipulation and analysis.
             2.  Matplotlib: Matplotlib is a Python library that provides many interfaces and functionality for
                2D graphics in various forms.
             3.  NumPy: NumPy stands for Numerical Python. It is a library consisting of multi-dimensional
                array  objects  and  a  collection  of  routines  for  processing  those  arrays.

            Functions of Pandas
             1.  read_csv(): This function is used to read data from CSV files to form a dataframe.
             2.  head(): This function is used to fetch ‘n’ number of rows from a Pandas object.
             3.  tail(): This function returns last ‘n’ rows from a Pandas object.
             4.  capitalize(): This is a built-in function for string handling. It returns a copy of the original
                string  and  converts  the  first  character  of  the  string  to  capital  letter  while  making  all  the
                other characters in the string lowercase.
             5.  append(): This method adds an item to the end of the list.
             6.  drop(): This function is used to drop specified labels from rows and columns.
             7.  rename(): This function is used to change the name of any row or column individually.
             8.  min(): This function finds out the minimum value from a given set of data.
             9.  max(): This function finds out the maximum value from a given set of data.
            10.  sort(): It arranges the values in a Pandas object in ascending or descending order.

              A.12
   63   64   65   66   67   68   69   70   71   72   73