Sumit SC is a data science practitioner with hands-on experience applying machine learning, statistical analysis, and data visualization to real-world problems. This portfolio documents that work — each project is grounded in a concrete business or research question, worked through systematically from raw data to interpretable findings. The projects range from regression modeling on structured datasets to exploratory analysis of behavioral and financial data, reflecting a broad interest in how data can inform decisions across domains.Documentation Index
Fetch the complete documentation index at: https://github-52.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Core skills
Sumit’s analytical work draws on a combination of programming, statistics, and domain reasoning:Python
Primary language for data manipulation, modeling, and visualization across all projects.
SQL
Querying and aggregating structured data from relational sources during the analysis phase.
Machine learning
Supervised learning methods including regression, classification, and model evaluation with Scikit-learn.
Statistical analysis
Hypothesis testing, correlation analysis, and distribution profiling to support data-driven conclusions.
Data visualization
Communicating findings through clear, well-labeled charts using Matplotlib and Seaborn.
Exploratory data analysis
Systematic investigation of raw datasets to surface patterns, outliers, and relationships before modeling.
Tools and environment
| Tool | Role |
|---|---|
| Jupyter Notebooks | Interactive analysis environment for iterative, reproducible work |
| Pandas | DataFrame-based data wrangling and feature engineering |
| NumPy | Array operations and numerical utilities |
| Scikit-learn | ML pipelines, model training, cross-validation, and metrics |
| Matplotlib | Figure layout, axes configuration, and base plots |
| Seaborn | Statistical chart types including heatmaps, pairplots, and distribution plots |
| GitHub Pages | Static site hosting for the portfolio at Sumit-SC.github.io |
Types of analysis
Exploratory data analysis
EDA projects — such as the IMDB Movie Analysis — focus on understanding the structure and content of a dataset before drawing conclusions. This involves checking data quality, examining distributions, and identifying relationships between variables through visualization and summary statistics.Predictive modeling
Modeling projects — such as Used Car Price Prediction — frame a business question as a supervised learning problem, engineer relevant features, train candidate models, and evaluate performance using appropriate metrics (RMSE, MAE, R², etc.).Risk and case study analysis
Projects like the Bank Loan Case Study take a structured analytical approach to a domain problem, combining EDA with segmentation and risk factor identification to produce actionable insights rather than a single predictive model.Time-series analysis
The ABC Call Volume Trend project examines temporal data to identify patterns, peak periods, and trends — applying resampling, rolling statistics, and visualization techniques suited to time-indexed data.GitHub profile
Sumit-SC on GitHub
Browse the source notebooks, datasets, and code for every project in this portfolio.