Skip to content
View irohitraj's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report irohitraj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
irohitraj/README.md

Rohit Raj - Data Scientist and AI Engineer

Turning messy data into actionable business insights through AI, machine learning, and analytics.

About me

I build things at the intersection of Analytics, LLMs, and Agentic AI.

Before grad school, I spent 5 years in industry as a Senior Data Scientist at eClerx and a Data Scientist at UST, working on ML systems, analytics pipelines, NLP search, forecasting, predictive maintenance, and business-impact data science.

I recently completed my MS in Computing — Artificial Intelligence Track at the University of Utah. I also interned at BMW Group while completing the course.

🟢 Currently looking for full-time Data Scientist / ML Engineer / AI Engineer roles in the US.


What I build

  • LLM & RAG systems — semantic search, retrieval pipelines, summarization, recommendations, and context-aware applications
  • AI agents — multi-agent workflows, tool-using systems, and reasoning-based recommendation pipelines
  • Computer vision models — object detection, segmentation, medical imaging, and damage detection
  • Applied ML systems — forecasting, predictive modeling, anomaly detection, customer segmentation, and business analytics
  • Data pipelines — SQL, Databricks, Spark, ETL workflows, dashboards, automation, and reporting systems

Impact snapshot

5+ years
Data Science & AI experience
$100M–$300M
Monthly originations supported through BMW loan optimization pipeline
600+
Credit attributes processed
$50M
Missed business opportunity identified at eClerx
2% MAPE
Order conversion forecasting
98%
Medical-bed ML pipeline accuracy
200%
Analysis efficiency improvement
0.64 mIoU
Coronary vessel segmentation

Tech stack

Tech stack icons

Area Tools
ML / AI LLMs, RAG, Agentic AI, NLP, Computer Vision, Deep Learning, Statistical Modeling, Predictive Modeling
Frameworks PyTorch, LangChain, Scikit-learn, OpenCV, Hugging Face, SpaCy, NLTK, Gradio, Flask
Data & Infra Databricks, Spark, SQL, BigQuery, MySQL, Oracle, ChromaDB, Docker, CI/CD, Power BI
Languages Python, C++, R, SQL

Featured Projects

Agentic Music Recommender

Multi-agent GenAI system that recommends music from images using a RAG Pipeline with multi-agent system. It uses vision-language reasoning to understand visual context and generate track recommendations by relying rules from the Knowledge Base.

Stack: LangChain · RAG · Agents · Vector DB · Multimodal GenAI



Guardrail Damage Detection

Fine-tuned Faster R-CNN to detect guardrail damage from dashcam images. Applied Focal Loss for class imbalance and evaluated model performance using GIoU and mAP. Also suggested a method to reduce false positives

Stack: PyTorch · Faster R-CNN · Computer Vision · Fine-tuning



Semantic Book Recommender

RAG-based book recommender that suggests books from user-provided topics, categories, and sentiment preferences using embeddings and a Gradio interface.

Stack: RAG · LLMs · LangChain · Hugging Face · Gradio · Python



Coronary Vessel Segmentation

U-Net-based semantic segmentation model for coronary vessel trees from X-ray coronary angiography images, achieving up to 0.64 mIoU.

Stack: PyTorch · U-Net · Medical Imaging · Segmentation · CV



Experience snapshot

BMW Group — Data Science Intern

Built scalable data pipelines with advanced sampling and cleaning for 600+ credit attributes, enabling a loan optimization model tied to $100M–$300M in monthly originations.

eClerx — Senior Data Scientist

Worked on large-scale analytics, forecasting, LLM prototypes, Databricks and Power BI dashboards, ETL automation, customer segmentation, and business-impact analysis.

Highlights:

  • Identified $50M in missed business opportunities from delayed product listing data
  • Forecasted web and app conversion rates using ETS, ARIMA, and Prophet with 2% MAPE
  • Built LLM-based prototypes for retail review summarization and emotion detection
  • Architected scalable ETL workflows and automated reporting, improving analysis efficiency by up to 200%
UST / Abzooba — Data Scientist

Worked on NLP search, predictive maintenance, customer analytics, web journey analytics, and ML pipeline optimization.

Highlights:

  • Built BERT-based medical search over clinical text
  • Built predictive maintenance models using real-time sensor data from silicon photonics machines
  • Engineered a Python ML pipeline for patient position prediction with 98% accuracy
  • Optimized an ML pipeline into C++ for System-on-Chip execution

GitHub activity

Rohit's GitHub stats Rohit's top languages

GitHub Streak


Contribution Activity

Snake animation

What I’m working on now

  • Building stronger RAG systems and practical AI agent workflows
  • Improving computer vision model performance on real-world data
  • Turning ML prototypes into clean, usable applications
  • Looking for full-time Data Scientist / ML Engineer / AI Engineer roles in the US

A little more about me

I like building systems that are useful, not just technically interesting.

Outside of work, I’m into drums, percussion, running, cycling, and occasionally going deep into anime rabbit holes.


Connect

Happy to connect — whether it’s about a role, a collaboration, or just talking about applied AI.

Footer

Pinned Loading

  1. NihalKarne/Agent_MapTune NihalKarne/Agent_MapTune Public

    MapTune: Agentic Framework for Visual Context Aware Music Recommendation.

    Jupyter Notebook

  2. Player-Sentiment-Trends Player-Sentiment-Trends Public

    Jupyter Notebook

  3. Computer-Vision Computer-Vision Public

    Self learning, Projects, Assignments

    Jupyter Notebook

  4. GenAI_Projects GenAI_Projects Public

    Projects using opensource LLMs or VLMs or

    Jupyter Notebook

  5. Data-Science-Hackathons-Competitions Data-Science-Hackathons-Competitions Public

    Top 2% and 10% HackerEarth competitions

    Jupyter Notebook

  6. NLP NLP Public

    Self learning of NLP through Implementation

    Jupyter Notebook