This guide presents a curated collection of AI projects, divided into beginner and expert categories. Beginner projects focus on foundational concepts like NLP, computer vision, and basic machine learning, while expert projects tackle advanced challenges such as multimodal generative AI, federated learning, and real-time voice cloning, offering opportunities to explore cutting-edge AI techniques and tools.
Ready to Master AI Development?
Check out our AI course with Certificate from Univeristy of Texas at Austin!
Explore NowSentiment Analysis in Python
Project Details: A project built to facilitate learning sentiment analysis using Python. It provides a rich blend of theoretical background and practical examples to understand how sentiment analysis works, covering text processing, NLP techniques, and machine learning algorithms for sentiment detection.
Features:
- Theoretical explanations of sentiment analysis
- Practical Python coding examples
- Step-by-step tutorials
Key Tools & Libraries:
- Python
- Jupyter Notebook
- NLP Libraries (e.g., NLTK, spaCy)
- Machine Learning Libraries (e.g., scikit-learn)
Project Source Code: GitHub
Style Transfer Paraphrase
Project Details: This project presents the official code and data for the EMNLP 2020 paper “Reformulating Unsupervised Style Transfer as Paraphrase Generation.” It provides the necessary dataset and codebase for performing style transfer as a paraphrase generation task, focusing on text reformulation.
Features:
- Unsupervised style transfer
- Paraphrase generation
- Dataset and codebase provided
Key Tools & Libraries:
- Python
- Jupyter Notebook
- HTML
- HuggingFace
Project Source Code: GitHub
NLP Recipe Recommender
Project Details: This project builds a recipe recommendation system based on Natural Language Processing (NLP), utilizing unstructured data and unsupervised learning. It employs topic modeling (TF-IDF + LSA) on a large recipe dataset to suggest relevant dishes, aiding users in discovering new recipes.
Features:
- Recipe recommendation system
- NLP for topic modeling
- Unsupervised learning approach
- Ingredient-based recommendations
Key Tools & Libraries:
- Python
- Jupyter Notebook
- Pandas, NumPy, SciPy
- Scikit-learn
- NLTK, Gensim
Project Source Code: GitHub
Spam Classifier
Project Details: This project focuses on building a Spam/Ham SMS classifier using Natural Language Processing (NLP) and various machine learning algorithms. It demonstrates data cleaning and preprocessing techniques like PorterStemmer, WordnetLemmatizer, CountVectorizer, and TFIDF Vectorizer. The project also implements an LSTM model with Word Embeddings, achieving an accuracy of 97.84%.
Features:
- SMS spam/ham classification
- Multiple ML algorithm comparisons
- NLP data preprocessing techniques
- LSTM with Word Embeddings
Key Tools & Libraries:
- Python
- Jupyter Notebook
- Scikit-learn
- Keras/TensorFlow
- NLTK
Project Source Code: GitHub
Summary Twitter Bot
Project Details: This project is a Twitter bot designed to summarize mentioned articles and provide related content. Users can mention the bot with an article URL, and it will reply with an image-based summary and links to related articles within 30 seconds.
Features:
- Summarizes articles from URLs
- Provides related articles
- Replies in image format
- Automated Twitter bot functionality
Key Tools & Libraries:
- Python
- Tweepy (for Twitter API)
- NLP libraries (e.g., NLTK, spaCy, Transformers)
- Image processing libraries (e.g., Pillow)
Project Source Code: GitHub
Next Word Prediction Using Markov Chain Model
Project Details: This project implements a Next Word Prediction system using the Markov Chain Model. Developed entirely in Python, it leverages the predictive capabilities of Markov models to anticipate the most likely word to follow a given sequence of words, offering a practical exploration of language modeling.
Features:
- Next word prediction
- Markov Chain Model implementation
- Command-line interface
Key Tools & Libraries:
- Python
- msvcrt module
Project Source Code: GitHub
LawGPT – RAG based Generative AI Attorney Chatbot
Project Details: LawGPT is a RAG (Retrieval Augmented Generation) based generative AI attorney chatbot trained on Indian Penal Code data. Developed using Streamlit, LangChain, and TogetherAI API, it allows users to ask legal questions and receive answers grounded in the IPC, aiming to make legal rights accessible.
Features:
- RAG-based generative AI chatbot
- Trained on Indian Penal Code data
- Provides legal answers and insights
- Live demo available
Key Tools & Libraries:
- Python
- Streamlit
- LangChain
- TogetherAI API (for LLM)
Project Source Code: GitHub
Dbias – Detecting Bias and ensuring Fairness in AI solutions
Project Details: Dbias is a Python package designed to detect and mitigate biases in Natural Language Processing (NLP) tasks, specifically for news articles. It provides an end-to-end framework that preprocesses raw data, identifies various bias types (e.g., gender, religious, political), and offers functionalities for text debiasing, bias classification, bias word/phrase recognition, and bias masking.
Features:
- Detects and mitigates NLP biases
- Text debiasing for unbiased recommendations
- Bias classification (biased/not biased)
- Identifies biased words/phrases
- Masks out biased portions of text
Key Tools & Libraries:
- Python
- Transformers (DistilBERT, RoBERTa, Masked Language Models)
- Hugging Face d4data/en_pipeline
Project Source Code: GitHub
Fake News Detection
Project Details: This project focuses on Fake News Detection in Python using natural language processing (NLP) techniques and machine learning algorithms. It leverages the LIAR dataset to classify news articles as “True” or “False” after extensive data preprocessing and feature selection. The project explores various classifiers, including Naive Bayes, Logistic Regression, SVM, and Random Forest, with a focus on performance evaluation.
Features:
- Fake news classification
- Utilizes NLP techniques (tokenizing, stemming, n-grams, TF-IDF)
- Compares various machine learning classifiers (Naive Bayes, Logistic Regression, SVM, Random Forest)
- Data preprocessing and feature extraction
- Outputs classification with probability of truth
Key Tools & Libraries:
- Python
- scikit-learn
- NumPy
- SciPy
- Jupyter Notebook
Project Source Code: GitHub
Image Classification using ConvolutionalNeural Networks
Project Details: This project classifies images of cats and dogs using Convolutional Neural Networks (CNNs). It showcases the entire process of training a CNN model, achieving high accuracy in classification, and visualizing the model’s performance through various plots.
Features:
- Cat and dog image classification
- CNN model implementation
- High accuracy (up to 97-99%)
- Loss and accuracy plots
Key Tools & Libraries:
- Python
- Jupyter Notebook
- Keras
- TensorFlow
- NumPy
Project Source Code: GitHub
Drawing Recognition using CNN and Flask
Project Details: This project develops an interactive drawing recognition application using a Convolutional Neural Network (CNN), which is then deployed with Flask. It involves training a CNN model on a subset of the Quick Draw Dataset to classify user-drawn images of various objects.
Features:
- Interactive drawing recognition
- CNN model for classification
- Flask web application
- Quick Draw Dataset integration
Key Tools & Libraries:
- Python
- Jupyter Notebook
- Keras
- TensorFlow
- Flask
- NumPy
- Pickle
Project Source Code: GitHub
YOLOv8 Custom Object Detection
Project Details: This project demonstrates custom object detection using YOLOv8 by Ultralytics. It covers detecting PPE, training single-class models (alpacas), and developing multi-class detectors for bees, butterflies, ants, and other insects. The automated training process efficiently prepares datasets and saves results, making it accessible for various computer vision tasks.
Features:
- Custom object detection (single and multi-class)
- Automated training process
- Object detection, tracking, and counting module
- Integration with Google Colab and Kaggle
Key Tools & Libraries:
- Python
- Jupyter Notebook
- Ultralytics YOLOv8
- OpenCV
Project Source Code: GitHub
Real-time Handwritten Digit Recognition
Project Details: This project builds a real-time handwritten digit recognition application using the MNIST dataset, Tkinter, and Convolutional Neural Networks (CNN). It processes image data, trains a CNN model for high accuracy (around 99%), and provides an interactive GUI where users can draw digits for instant prediction.
Features:
- Real-time digit recognition
- CNN model for image classification
- Interactive Tkinter GUI
- MNIST dataset integration
Key Tools & Libraries:
- Python
- Keras
- TensorFlow
- Tkinter
- OpenCV (cv2)
Project Source Code: GitHub
DDColor
Project Details: This project provides the official PyTorch implementation of the ICCV 2023 paper “DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders.” DDColor offers vivid and natural colorization for historical black-and-white photos and can even recolor landscapes from anime games into a realistic style using multi-scale visual features.
Features:
- Photo-realistic image colorization
- Dual Decoder (DDColor) architecture
- Vivid and natural results
- Supports photos and anime landscapes
Key Tools & Libraries:
- Python
- PyTorch
- Hugging Face
- ModelScope
Project Source Code: GitHub
Plant Disease Detection
Project Details: This project implements a Plant Disease Detection system using deep learning, specifically Convolutional Neural Networks (CNNs). It classifies leaf images into 39 different categories of plant diseases, trained on the Plant Village dataset, and includes a deployed Flask web application for practical use.
Features:
- Plant disease classification from leaf images
- CNN model with PyTorch
- Supports 39 disease categories
- Flask web application for deployment
Key Tools & Libraries:
- Python
- PyTorch
- Flask
- Jupyter Notebook
Project Source Code: GitHub
PyTorch Grad-CAM
Project Details: This project offers a comprehensive explainable AI (XAI) package for PyTorch, providing state-of-the-art methods for computer vision. It supports various architectures (CNNs, Vision Transformers) and tasks (classification, object detection, semantic segmentation, embedding-similarity), offering a wide range of pixel attribution methods and metrics to diagnose model predictions and evaluate explanation trustworthiness.
Features:
- Comprehensive collection of pixel attribution methods (GradCAM, HiResCAM, ScoreCAM, etc.)
- Supports CNNs and Vision Transformers
- Works with classification, object detection, semantic segmentation, and more
- Includes smoothing methods and metrics for explanation evaluation
Key Tools & Libraries:
- Python
- PyTorch
- NumPy
- OpenCV
Project Source Code: GitHub
PhotoPrism: AI-Powered Photos App
Project Details: PhotoPrism is an AI-powered photo management application designed for decentralized use, emphasizing user privacy. It automatically tags and organizes pictures and videos using AI, supports various formats, and offers powerful search filters, face recognition, and interactive maps. It can be self-hosted on various platforms, including Raspberry Pi and Apple Silicon.
Features:
- AI-powered auto-tagging and organization
- Face recognition and live photos
- Six high-resolution World Maps for geotagged memories
- Supports RAW images and video formats
- Self-hosted with Docker for various platforms
Key Tools & Libraries:
- Go
- TensorFlow
- Docker
- Vue.js
Project Source Code: GitHub
Tic-Tac-Toe Minimax AI
Project Details: This project implements the Minimax AI algorithm on the classic Tic-Tac-Toe game. It demonstrates how AI can solve two-player, zero-sum games by recursively searching for the optimal move, aiming to lead the “Max” player to victory or a draw.
Features:
- AI opponent using Minimax algorithm
- Two-player game logic
- Game tree representation
Key Tools & Libraries:
- Python
- JavaScript
- HTML
Project Source Code: GitHub
Rock Paper Scissors AI
Project Details: This project develops an advanced AI player for Rock Paper Scissors using machine learning and adaptive strategies. It employs an LSTM neural network to predict opponent moves based on historical data, continuously learning and adapting to different playing styles to achieve a high win rate.
Features:
- AI player for Rock Paper Scissors
- LSTM neural network for move prediction
- Adaptive and online learning
- Opponent modeling and ensemble decision making
Key Tools & Libraries:
- Python
- TensorFlow
- Keras
- NumPy
Project Source Code: GitHub
AI-Powered Text Adventure Game
Project Details: This project is an immersive, AI-driven text adventure game that adapts to player choices, creating a unique story each time. It features interactive storytelling, an AI memory system, an inventory system, custom actions, and detailed logging of the adventure.
Features:
- AI-driven interactive storytelling
- Dynamic narrative based on choices
- AI memory and action history
- Inventory and player status tracking
- Customizable game data
Key Tools & Libraries:
- Python
- OpenAI API (or compatible APIs)
Project Source Code: GitHub
Reinforcement Learning Pong Bot
Project Details: This project implements a Reinforcement Learning bot that plays the classic Atari Pong game. It leverages TF-Agents to train a Deep Q-Network (DQN) with an experience replay algorithm, learning to play directly from raw pixel observations. The project explores optimizing hyperparameters like replay buffer size and optimizer choice for improved performance.
Features:
- AI bot for Pong game
- Deep Q-Network (DQN) implementation
- Experience replay algorithm
- Optimized hyperparameter tuning
Key Tools & Libraries:
- Python
- Jupyter Notebook
- TensorFlow
- TF-Agents
Project Source Code: GitHub
Few Shot, Zero Shot and Meta Learning Research
Project Details: This project is a research initiative focusing on few-shot, zero-shot, and meta-learning problems, specifically for image classification. It implements and explores various few-shot algorithms like Prototypical Networks and Model-Agnostic Meta-Learning (MAML), providing clean, readable code and detailed explanations of the underlying theories.
Features:
- Implementations of Few-Shot Learning algorithms (Prototypical Networks, MAML)
- Explores zero-shot and meta-learning concepts
- Supports image classification tasks
- Provides theoretical explanations and reproducible results
Key Tools & Libraries:
- Python
- PyTorch
- NumPy
Project Source Code: GitHub
Smart Finance Tracker
Project Details: This project is an AI-powered expense tracking application built with Streamlit. It allows users to manage personal finances through natural language input, intelligent categorization, and automatic date detection. It integrates with Google Sheets for reliable data storage and provides comprehensive financial analytics and insights.
Features:
- Natural language expense/income entry
- Smart transaction categorization
- Google Sheets integration
- Comprehensive financial analytics dashboard
- Responsive design
Key Tools & Libraries:
- Python
- Streamlit
- Google Sheets API
- Gemini AI API
Project Source Code: GitHub
OpenBB Platform
Project Details: The OpenBB Platform is the first open-source financial data and AI platform, offering comprehensive access to equity, options, crypto, forex, macro economy, and fixed income data. It provides a Python API for integration and supports a broad range of extensions. The platform can be connected to the OpenBB Workspace for enterprise UI and AI agent capabilities.
Features:
- Open-source financial data platform
- Access to diverse financial data (equity, options, crypto, forex, macro, fixed income)
- Python API for integration
- Supports extensions and AI agents
- Connects to OpenBB Workspace for advanced UI
Key Tools & Libraries:
- Python
- FastAPI (for API server)
- Uvicorn
Project Source Code: GitHub
Music Genre Classification
Project Details: This is an audio signal processing project designed to classify music genres using deep learning. It uses the GTZAN dataset, extracts features with the Librosa library, and employs a CNN architecture for classification, achieving an accuracy of approximately 80%. A Flask web application is also part of this project.
Features:
- Music genre classification
- Feature extraction (MFCCs)
- CNN model
- Flask web application
- MP3 to WAV conversion
Key Tools & Libraries:
- Python
- Jupyter Notebook
- Flask
- Librosa
- TensorFlow/Keras
- FFMPEG
Project Source Code: GitHub
Animal Sound Recognition
Project Details: This project focuses on animal sound recognition using deep learning techniques. It applies methodologies from the PANNs paper to classify animal sounds from the Google AudioSet database. The project demonstrates that models specifically trained on animal sounds achieve better results than general-purpose models, with custom CNN and ResNet models surpassing previous benchmarks.
Features:
- Animal sound classification
- Application of PANNs techniques
- Comparison of specialized vs. general models
- Achieved higher mAP for animal sound classification
Key Tools & Libraries:
- Python
- TensorFlow/Keras
- PyTorch
- Librosa
Project Source Code: GitHub
Priority Task Selection Using Evolutionary Programming
Project Details: This project is a web application that utilizes Evolutionary Programming to determine and prioritize tasks for efficient scheduling. It helps users select the most important tasks from a given list based on priority scales and available time, providing an optimized list of tasks and their total priority.
Features:
- Task prioritization
- Evolutionary Programming algorithm
- Web application interface
- Input for task details and time constraints
Key Tools & Libraries:
- Python
- Flask
- NumPy
- HTML, CSS, JavaScript
Project Source Code: GitHub
AI Humor Generator
Project Details: This project focuses on generating jokes using Artificial Intelligence. While details on the specific AI models are brief in the overview, it provides the foundational setup for an AI system to create humorous content, highlighting the use of Python and a specific TensorFlow version.
Features:
- Generates jokes using AI
- Basic setup for AI humor generation
- Focus on Python development environment
Key Tools & Libraries:
- Python
- Jupyter Notebook
- TensorFlow
Project Source Code: GitHub
Sequence Prediction ANN
Project Details: This project implements a simple Artificial Neural Network (ANN) in Python for predicting the next number in a given sequence. It features modularized code for data preparation, network architecture, and training, using a feedforward network with backpropagation and customizable parameters for various numerical sequence predictions.
Features:
- Predicts next number in a sequence
- Simple Artificial Neural Network (ANN)
- Modularized code for clarity
- Uses feedforward network with backpropagation
Key Tools & Libraries:
- Python
- NumPy
- Scikit-learn
Project Source Code: GitHub
Programming Language Classifier
Project Details: This project implements a machine learning model to detect the source code language from a text string. Utilizing a Kaggle dataset of 97 million code samples, it trains, tests, and deploys classification models with over 80% accuracy for 20 different programming languages.
Features:
- Classifies 20 programming languages
- Uses machine learning for text classification
- Achieves over 80% accuracy
- Supports auto-syntax highlighting with a text editor fork
Key Tools & Libraries:
- Python
- Scikit-learn
- Matplotlib
- Flask
Project Source Code: GitHub
AI Voice Assistant
Project Details: This project develops an advanced AI Voice Assistant with integrated Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities. It allows natural vocal interaction and utilizes various tools for user requests, including calendar management, contact handling, email composition, web searching, and accessing a personal knowledge base.
Features:
- Speech-to-Text (STT) and Text-to-Speech (TTS)
- Vocal interaction with AI agent
- Integration with Google Calendar, Contacts, Gmail
- Web search via Tavily API
- Personal knowledge base access
Key Tools & Libraries:
- Python
- Google API (Calendar, Contacts, Gmail)
- Tavily API
- Groq API (Llama3)
- Google Gemini API
- Deepgram API
Project Source Code: GitHub
Real-Time Voice Cloning
Project Details: This repository implements real-time voice cloning based on the “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)” framework, incorporating a real-time vocoder. The three-stage deep learning framework allows generating speech from arbitrary text using a digital representation of a voice created from a few seconds of audio.
Features:
- Real-time voice cloning
- Implements SV2TTS framework
- Utilizes WaveRNN vocoder and Tacotron synthesizer
- GE2E encoder for speaker verification
- Toolbox for easy demonstration and use
Key Tools & Libraries:
- Python
- Deep Learning (PyTorch, possibly TensorFlow for original research)
- FFmpeg
Project Source Code: GitHub
Tabby
Project Details: Tabby is a self-hosted AI coding assistant, serving as an open-source and on-premises alternative to GitHub Copilot. It offers an OpenAPI interface for easy integration, supports consumer-grade GPUs, and requires no external DBMS or cloud services, providing a private and flexible solution for code completion and assistance.
Features:
- Self-hosted AI coding assistant
- Open-source and on-premises alternative to Copilot
- OpenAPI interface for integration
- Supports consumer-grade GPUs
Key Tools & Libraries:
- Rust
- Docker
- Python
Project Source Code: GitHub
Healing Agent 🩺
Project Details: Healing Agent is an intelligent code assistant for Python that detects and fixes errors with detailed context using AI. It leverages various Large Language Models (LLMs) to provide smart suggestions, auto-generated hints, and even optional automated code fixes, aiming for “self-healing” programs with regenerative abilities like Wolverine.
Features:
- Automatic error detection and handling
- Smart error analysis and solution suggestions
- Comprehensive error context saving (stack traces, variables)
- AI-powered code healing using various LLMs (OpenAI, Azure, LiteLLM, Anthropic, Ollama)
- Optional automated code fixes with backups
Key Tools & Libraries:
- Python
- OpenAI API, Azure OpenAI API, LiteLLM, Anthropic API, Ollama
Project Source Code: GitHub
Multimodal Generative AI for BPM
Project Details: This project explores the feasibility of generating structured business process models (BPM) from unstructured multimodal documents (images and text) using generative AI, specifically GPT-4V. It provides a comprehensive framework, including data preparation, dataset exploration, BPMN generation via zero-shot, one-shot, and few-shot prompting, and an evaluation framework to assess model similarity to ground truth.
Features:
- Generates BPM models from multimodal inputs
- Utilizes GPT-4V with various prompting techniques
- Includes dataset preparation and exploration notebooks
- Provides an evaluation framework for model similarity
Key Tools & Libraries:
- Python
- Jupyter Notebook
- Pandas, NumPy
- conda for environment management
Project Source Code: GitHub
FedCompass – Efficient Cross-Silo Federated Learning
Project Details: This project introduces FedCompass, a semi-asynchronous federated learning (FL) algorithm designed to enhance time-efficiency and model performance in heterogeneous client environments. It features a Computing Power Aware Scheduler (COMPASS) that adaptively assigns local steps to clients, ensuring synchronized model arrivals. The framework is built on APPFL and uses gRPC for distributed FL experiments.
Features:
- Semi-asynchronous federated learning
- Computing Power Aware Scheduler (COMPASS)
- Supports synchronous and asynchronous FL
- gRPC deployment for distributed environments
Key Tools & Libraries:
- Python
- gRPC
- PyTorch
- APPFL framework
Project Source Code: GitHub
GPT Researcher
Project Details: GPT Researcher is an open deep research agent capable of conducting web and local research, generating detailed, factual, and unbiased reports with citations. It uses a “planner” and “execution” multi-agent architecture to gather information efficiently, addressing misinformation, speed, and reliability issues often found in LLMs. It also features advanced “Deep Research” for tree-like exploration of topics.
Features:
- Generates detailed research reports with citations
- Supports web and local document research
- Multi-agent architecture for planning and execution
- “Deep Research” for recursive topic exploration
- Frontend applications for enhanced user experience
Key Tools & Libraries:
- Python
- FastAPI
- Uvicorn
- LangChain (for multi-agent assistant)
- TypeScript, JavaScript (for frontend)
Project Source Code: GitHub
Resume Job Matcher
Project Details: Resume Job Matcher is a Python script that automates resume-job matching using AI. It leverages the Anthropic Claude or OpenAI GPT API to analyze resumes against a job description, providing a match score, detailed evaluation (skills, experience, education), and personalized email responses for candidates. It also assesses resume quality and detects “red flags.”
Features:
- AI-powered resume-job matching
- Supports Anthropic Claude and OpenAI GPT APIs
- Generates match scores and detailed evaluations
- Creates personalized email responses
- Detects “red flags” and assesses resume quality
Key Tools & Libraries:
- Python
- PyPDF2
- anthropic (for Claude API)
- openai (for GPT API)
- pydantic, tqdm, termcolor, json5, requests, beautifulsoup4
Project Source Code: GitHub
Data extractor for PDF invoices – invoice2data
Project Details: Invoice2data is a command-line tool and Python library for extracting key information from PDF invoices. It automates text extraction using various techniques (PDF miners, OCR), applies a flexible YAML/JSON-based template system for different layouts, and outputs structured data in formats like CSV, JSON, or XML, streamlining accounting processes and data entry.
Features:
- Automated data extraction from PDF invoices
- Supports various text extraction techniques (pdftotext, OCR, etc.)
- Flexible YAML/JSON template system
- Outputs structured data (CSV, JSON, XML)
- Includes plugins for line item extraction
Key Tools & Libraries:
- Python
- PyYAML
- pdftotext, tesseract, pdfminer.six, pdfplumber, ocrmypdf, gvision
Project Source Code: GitHub
Prophet: Automatic Forecasting Procedure
Project Details: Prophet is an automatic forecasting procedure developed by Facebook for time series data. It utilizes an additive model with non-linear trends, yearly, weekly, and daily seasonality, and holiday effects. Robust to missing data segments and trend shifts, Prophet excels with time series data exhibiting strong seasonal patterns and ample historical data, available in both R and Python.
Features:
- Automatic time series forecasting
- Handles multiple seasonality (yearly, weekly, daily) and holidays
- Robust to missing data segments and trend shifts
- Available in both multiple R and Python versions
Key Tools & Libraries:
- Python
- R
- Stan
- Pandas, NumPy
Project Source Code: GitHub
Property Prediction with Neural Networks on Raw Molecular Graphs
Project Details: This project provides the codebase for property prediction in drug discovery using Graph Neural Networks (GNNs) on raw molecular graphs. It implements various GNN models, including GGNN, AttentionGGNN, and EMN, for bioactivity and physical-chemical property prediction. The repository is based on master’s thesis work and a peer-reviewed paper, offering detailed insights into model training and prediction.
Features:
- Property prediction for drug discovery
- Implements Gated Graph Neural Networks (GGNN), AttentionGGNN, EMN
- Handles raw molecular graphs
- Supports bioactivity and physical-chemical property prediction
Key Tools & Libraries:
- Python
- PyTorch
- RDKit
- Scikit-learn
Project Source Code: GitHub
Personalized Medicine
Project Details: This project proposes an efficient text classifier to automate the classification of genetic variant effects for personalized cancer treatment. It addresses the challenge of manually analyzing vast medical literature by applying natural language processing (NLP) techniques, including data preprocessing, various feature extraction methods (Bag-of-Words, Word Embeddings, BERT), and evaluation of machine learning and neural network models (CNN, BiLSTM).
Features:
- Classifies genetic variant effects for personalized medicine
- Comprehensive NLP pipeline (tokenization, lemmatization, stop word removal)
- Various feature extraction methods (one-hot encoding, BoW, TF-IDF, pre-trained word embeddings like GloVe, BioWordVec, BERT)
- Evaluation of 8 machine learning classifiers and neural networks (CNN, BiLSTM)
Key Tools & Libraries:
- Python
- PyTorch
- NLTK, SpaCy
- Scikit-learn
- NumPy, Pandas
Project Source Code: GitHub
ClimateLearn: Machine Learning for Predicting Weather and Climate
Project Details: ClimateLearn is a tutorial project demonstrating the application of machine learning for predicting weather and climate variables. It also focuses on transforming low-resolution outputs of climate models into high-resolution regional forecasts. Originally presented at NeurIPS 2022, this project provides a Colab-ready notebook for hands-on learning.
Features:
- Applies machine learning to weather and climate prediction
- Transforms low-resolution climate model outputs to high-resolution forecasts
- Educational tutorial format
- Designed for execution in Colab
Key Tools & Libraries:
- Python
- Jupyter Notebook (Colab)
- Machine Learning libraries (likely PyTorch/TensorFlow, scikit-learn given the context)
Project Source Code: GitHub