The use of sentence-transformers library to generate embeddings for text data is a core GenAI concept. Embeddings are numerical representations of text that capture semantic meaning, allowing for similarity comparisons and other downstream tasks.
The use of FAISS (Facebook AI Similarity Search) to find similar documents based on their embeddings is another GenAI-adjacent technique. Similarity search is crucial for tasks like information retrieval, recommendation systems, and clustering.
While the Linnerud dataset itself is not text-based, the project involved creating text descriptions from the numerical data, highlighting the importance of text data processing techniques in GenAI.
The creation of proxy ground truth labels can be seen as a form of data augmentation, where I am artificially creating labels to train a supervised model. While not a direct GenAI technique, it's a common practice in machine learning when labeled data is scarce.
The process of experimenting with different models (Multinomial Naive Bayes, Linear Regression, Ridge Regression) and evaluating their performance using metrics like accuracy, R-squared, and MAE is a fundamental machine learning concept that applies to both traditional and GenAI models.
The creation of new features from existing ones (e.g., calculating BMI, creating interaction features) is a crucial step in both traditional and GenAI machine learning.
KMeans clustering, while not strictly GenAI, is a common unsupervised learning technique used to discover patterns and group similar data points together. It can be used to pre-process data for GenAI models or to analyze the output of GenAI models.