Skip to main content

Documentation Index

Fetch the complete documentation index at: https://github-52.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Sumit SC is a data science practitioner with hands-on experience applying machine learning, statistical analysis, and data visualization to real-world problems. This portfolio documents that work — each project is grounded in a concrete business or research question, worked through systematically from raw data to interpretable findings. The projects range from regression modeling on structured datasets to exploratory analysis of behavioral and financial data, reflecting a broad interest in how data can inform decisions across domains.

Core skills

Sumit’s analytical work draws on a combination of programming, statistics, and domain reasoning:

Python

Primary language for data manipulation, modeling, and visualization across all projects.

SQL

Querying and aggregating structured data from relational sources during the analysis phase.

Machine learning

Supervised learning methods including regression, classification, and model evaluation with Scikit-learn.

Statistical analysis

Hypothesis testing, correlation analysis, and distribution profiling to support data-driven conclusions.

Data visualization

Communicating findings through clear, well-labeled charts using Matplotlib and Seaborn.

Exploratory data analysis

Systematic investigation of raw datasets to surface patterns, outliers, and relationships before modeling.

Tools and environment

ToolRole
Jupyter NotebooksInteractive analysis environment for iterative, reproducible work
PandasDataFrame-based data wrangling and feature engineering
NumPyArray operations and numerical utilities
Scikit-learnML pipelines, model training, cross-validation, and metrics
MatplotlibFigure layout, axes configuration, and base plots
SeabornStatistical chart types including heatmaps, pairplots, and distribution plots
GitHub PagesStatic site hosting for the portfolio at Sumit-SC.github.io

Types of analysis

Exploratory data analysis

EDA projects — such as the IMDB Movie Analysis — focus on understanding the structure and content of a dataset before drawing conclusions. This involves checking data quality, examining distributions, and identifying relationships between variables through visualization and summary statistics.

Predictive modeling

Modeling projects — such as Used Car Price Prediction — frame a business question as a supervised learning problem, engineer relevant features, train candidate models, and evaluate performance using appropriate metrics (RMSE, MAE, R², etc.).

Risk and case study analysis

Projects like the Bank Loan Case Study take a structured analytical approach to a domain problem, combining EDA with segmentation and risk factor identification to produce actionable insights rather than a single predictive model.

Time-series analysis

The ABC Call Volume Trend project examines temporal data to identify patterns, peak periods, and trends — applying resampling, rolling statistics, and visualization techniques suited to time-indexed data.

GitHub profile

Sumit-SC on GitHub

Browse the source notebooks, datasets, and code for every project in this portfolio.