Investimate — NASDAQ-100 Risk Analysis & Strategy Simulator

Table of Contents

Overview
#

Can machine learning actually help everyday people make sense of the stock market? That was the question behind Investimate — a team project built for Move Tickers, where four of us developed a platform that classifies NASDAQ-100 companies by risk and predicts next-day market trends, all accessible through an interactive Streamlit dashboard.

This was the project that pushed me beyond model training and into the full picture: designing databases, engineering features from messy financial data, discovering that signal processing techniques can transform how a model sees the market, and learning what it takes to make code work as a team.

The Problem
#

The stock market is overwhelming — even for people who want to invest wisely. Most retail investors don’t have the tools to assess how a company reacts to inflation, interest rate changes, or market volatility. Move Tickers, a SaaS company in the financial data space, came to us with a clear goal: make advanced financial insights accessible to everyone.

Our challenge: build a system that segments companies into meaningful risk tiers and forecasts short-term market direction — then wrap it in something a non-technical user can actually interact with.

The Approach
#

Starting with the data — Before any modeling could happen, we needed a solid foundation. I took the lead on building the PostgreSQL database from raw NASDAQ-100 CSV files, integrating daily stock prices for six representative companies (Apple, Microsoft, Palantir, KLA, Qualcomm, CrowdStrike) with macroeconomic indicators like CPI, VIX, unemployment, and interest rates. Cleaning this data taught me a lot — monthly macro data doesn’t align neatly with daily stock prices, so I used spline interpolation to fill the gaps while preserving economic trends. The final schema followed a dual star-schema design: one for market analytics, one for user interactions.

Market-Centric Schema

User-Centric Schema

Risk classification — The team built a clustering pipeline using K-Means to group companies by their 30-day volatility, max drawdown, momentum, and sensitivity to macroeconomic shifts. These clusters became the labels — Low, Moderate, and High risk — which we then used to train supervised models. KNN handled the initial classification, but XGBoost with balanced sample weights pushed accuracy to 87%, with strong performance across all three risk tiers.

Predicting tomorrow’s market — This was where things got interesting for me. I built trend prediction models for each of the six companies using Random Forest classifiers, iterating through three versions. The baseline — raw price features — barely beat a coin flip at 47% accuracy. Adding rolling averages and macro indicators helped, but the real breakthrough came from wavelet denoising. By applying Discrete Wavelet Transform to strip noise from price signals before feature engineering, the models started seeing patterns that were previously buried. AAPL’s backtest accuracy jumped to 74%. PLTR hit 80% with perfect recall.

Each model was individually tuned using RandomizedSearchCV, and the features that mattered most were denoised close prices, RSI, volume, MACD, ATR, and VIX.

Making it usable — The models were integrated into a Streamlit dashboard where users could explore company risk profiles, test trading strategies (Buy & Hold, Moving Average Crossover, RSI-based), and see next-day predictions. I also refactored the team’s entire codebase into modular, reusable functions — something that started as cleanup but turned out to be essential for the app to work.

Key Results
#

Component	Metric	Score
Risk Classification (XGBoost)	Accuracy	87%
Risk Classification (KNN tuned)	Accuracy	71%
K-Means Clustering	Silhouette Score	0.379
AAPL Trend Prediction	Backtest Accuracy	74%
PLTR Trend Prediction	Backtest Accuracy	80%
KLAC Trend Prediction	Backtest Accuracy	70%
MSFT Trend Prediction	Backtest Accuracy	68%
QCOM Trend Prediction	Backtest Accuracy	65%
CRWD Trend Prediction	Backtest Accuracy	61%

Reflections
#

If there’s one thing this project taught me, it’s that data engineering is half the battle. Building a clean, well-structured database from multiple messy sources was harder and more important than any single model. It’s the kind of work that doesn’t look flashy, but without it nothing else works.

The wavelet denoising discovery was a highlight — realizing that signal processing techniques from a completely different field could dramatically improve financial predictions. It changed how I think about feature engineering: sometimes the best thing you can do for your model isn’t adding more data, it’s cleaning the signal.

Working in a team of four also shaped this experience. I owned the database and prediction pipeline, but the project only came together because our pieces fit. Refactoring everyone’s code into clean modules wasn’t glamorous, but it was the glue that held the Streamlit app together.

Looking ahead, I’d want to explore LSTM networks for the trend prediction, incorporate sentiment analysis from financial news, and build a backtesting framework that accounts for transaction costs and slippage — the things that separate a good model from a real trading strategy.

🛠 Technologies Used

Python
PostgreSQL
Scikit-learn
XGBoost
K-Means
KNN
Random Forest
Wavelet Denoising
Streamlit
Pandas
Matplotlib
SQLAlchemy

View on GitHub

Overview #

The Problem #

The Approach #

Key Results #

Reflections #