Видео 137
Просмотров 1 843 677

Understanding ChatGPT and LLMs from Scratch - Part 2

25:28

Understanding ChatGPT and LLMs from Scratch - Part 1

34:21

Understanding BERT Embeddings and How to Generate them in SageMaker

13:40

Understanding Coordinate Descent

5:59

Bootstrap and Monte Carlo Methods

17:15

Maximum Likelihood as Minimizing KL Divergence

10:34

Limitations of the ChatGPT and LLMs - Part 3

If you haven't watched the Part 1 and Part 2, I highly suggest watching them before watching the Part 3.
Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation about Transformers as well. This is the last episode of this amazing serie. Thanks for watching.

Видео

Understanding ChatGPT and LLMs from Scratch - Part 2

25:28

Understanding ChatGPT and LLMs from Scratch - Part 2

Просмотров 863Год назад

Large Language Models (LLMs) have shown a huge potential and recently the have drawn much attention. In this presentation, Ameet Deshpande and Alexander Wettig gives a detailed explanation about how Large Language Models and ChatGPT works. He makes clear that he does not assume that the audience has any prior knowledge about language models. He starts with embedding and give an explanation abou...

Understanding ChatGPT and LLMs from Scratch - Part 1

34:21

Understanding ChatGPT and LLMs from Scratch - Part 1

Просмотров 3,4 тыс.Год назад

Understanding BERT Embeddings and How to Generate them in SageMaker

13:40

Understanding BERT Embeddings and How to Generate them in SageMaker

Просмотров 4,5 тыс.Год назад

Course link: www.coursera.org/learn/ml-pipelines-bert In this course, you will use BERT for the same purpose. Before diving into the BERT algorithm, I will highlight a few differences between BlazingText and BERT at a very high level. As you can see here, BlazingText is based on Word2Vec, whereas BERT is based on transformer architecture. Both BlazingText and BERT generate word embeddings. Howe...

5:59

Understanding Coordinate Descent

Просмотров 7 тыс.Год назад

Course link: www.coursera.org/learn/ml-regression let's just have a little aside on the coordinate decent algorithm, and then we're gonna describe how to apply coordinate descent to solving our lasso objective. So, our goal here is to minimize sub function g. So, this is the same objective that we have whether we are talking about our closed form solution, gradient descent, or this coordinate d...

17:15

Bootstrap and Monte Carlo Methods

Просмотров 7 тыс.Год назад

Here we look at the two main concepts that are behind this revolution, the Monte Carlo method and the bootstrap. We will discuss the main principles behind these methods and then see how to apply them in various important contexts, such as in regression and for constructing confidence intervals. Course link: www.coursera.org/learn/stanford-statistics/

Maximum Likelihood as Minimizing KL Divergence

10:34

Maximum Likelihood as Minimizing KL Divergence

Просмотров 2,8 тыс.2 года назад

While the Bayes' formula for the posterior probability or for parameters given the data is very general, there are some interesting special cases where that can be analyzed separately. Let's look at them in a sequence. The first special case arises when the model is a fixed one and for all. In this case, we can drop the conditioning on M in this formula. The Bayesian evidence, in this case, is ...

16:17

Understanding The Shapley Value

Просмотров 14 тыс.2 года назад

Shapley Value is one of the most prominent ways of dividing up the value of a society, the productive value of some, set of individuals among its members. The Shapley Value is, is based on Lloyd Shapley's idea that members should basically be receiving things which are proportional to their marginal contributions. So, basically we look at what, what does a person add when we add them to a group...

5:01

Kalman Filter - Part 2

Просмотров 26 тыс.2 года назад

Course Link: www.coursera.org/learn/state-estimation-localization-self-driving-cars Let's consider our Kalman Filter from the previous lesson and use it to estimate the position of our autonomous car. If we have some way of knowing the true position of the vehicle, for example, an oracle tells us, we can then use this to record a position error of our filter at each time step k. Since we're dea...

8:35

Kalman Filter - Part 1

Просмотров 97 тыс.3 года назад

This course will introduce you to the different sensors and how we can use them for state estimation and localization in a self-driving car. By the end of this course, you will be able to: - Understand the key methods for parameter and state estimation used for autonomous driving, such as the method of least-squares - Develop a model for typical vehicle localization sensors, including GPS and I...

Recurrent Neural Networks (RNNs) and Vanishing Gradients

5:43

Recurrent Neural Networks (RNNs) and Vanishing Gradients

Просмотров 8 тыс.3 года назад

For one, the way plain or vanilla RNN model sequences by recalling information from the immediate past, allows you to capture dependencies to a certain degree, at least. They're also relatively lightweight compared to other n-gram models, taking up less RAM and space. But there are downsides, the RNNs architecture optimized for recalling the immediate past causes it to struggle with longer sequ...

Transformers vs Recurrent Neural Networks (RNN)!

6:28

Transformers vs Recurrent Neural Networks (RNN)!

Просмотров 21 тыс.3 года назад

Course link: www.coursera.org/learn/attention-models-in-nlp/lecture/glNgT/transformers-vs-rnns Using an RNN, you have to take sequential steps to encode your input, and you start from the beginning of your input making computations at every step until you reach the end. At that point, you decode the information following a similar sequential procedure. As you can see here, you have to go throug...

Language Model Evaluation and Perplexity

6:46

Language Model Evaluation and Perplexity

Просмотров 18 тыс.3 года назад

Course Link: www.coursera.org/lecture/probabilistic-models-in-nlp/language-model-evaluation-SEO4T Transcript: In this video I'll show you how to evaluate a language model. The metric for this is called perplexity and I will explain what this is. First, you'll divide the text corpus into train validation and test data, then you will dive into the concepts of perplexity an important metric used t...

Common Patterns in Time Series: Seasonality, Trend and Autocorrelation

5:06

Common Patterns in Time Series: Seasonality, Trend and Autocorrelation

Просмотров 8 тыс.4 года назад

Course link: www.coursera.org/learn/tensorflow-sequences-time-series-and-prediction Time-series come in all shapes and sizes, but there are a number of very common patterns. So it's useful to recognize them when you see them. For the next few minutes we'll take a look at some examples. The first is trend, where time series have a specific direction that they're moving in. As you can see from th...

Limitations of Graph Neural Networks (Stanford University)

1:26:35

Limitations of Graph Neural Networks (Stanford University)

Просмотров 14 тыс.4 года назад

Limitations of Graph Neural Networks (Stanford University)

Understanding Metropolis-Hastings algorithm

9:49

Understanding Metropolis-Hastings algorithm

Просмотров 69 тыс.4 года назад

Understanding Metropolis-Hastings algorithm

Learning to learn: An Introduction to Meta Learning

1:27:17

Learning to learn: An Introduction to Meta Learning

Просмотров 27 тыс.4 года назад

Learning to learn: An Introduction to Meta Learning

Page Ranking: Web as a Graph (Stanford University 2019)

1:26:56

Page Ranking: Web as a Graph (Stanford University 2019)

Просмотров 3,4 тыс.4 года назад

Page Ranking: Web as a Graph (Stanford University 2019)

Deep Graph Generative Models (Stanford University - 2019)

1:22:31

Deep Graph Generative Models (Stanford University - 2019)

Просмотров 19 тыс.4 года назад

Deep Graph Generative Models (Stanford University - 2019)

Graph Node Embedding Algorithms (Stanford - Fall 2019)

1:29:00

Graph Node Embedding Algorithms (Stanford - Fall 2019)

Просмотров 67 тыс.4 года назад

Graph Node Embedding Algorithms (Stanford - Fall 2019)

Graph Representation Learning (Stanford university)

1:16:53

Graph Representation Learning (Stanford university)

Просмотров 94 тыс.4 года назад

Graph Representation Learning (Stanford university)

13:22

Understanding Word Embeddings

Просмотров 10 тыс.4 года назад

Understanding Word Embeddings

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

10:33

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

Просмотров 1,6 тыс.4 года назад

Variational Autoencoders - Part 2 ( Modeling a Distribution of Images )

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

6:26

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

Просмотров 2,9 тыс.4 года назад

Variational Autoencoders - Part 1 (Scaling Variational Inference & Unbiased estimates)

6:58

DBSCAN: Part 2

Просмотров 21 тыс.5 лет назад

DBSCAN: Part 2

8:21

DBSCAN: Part 1

Просмотров 29 тыс.5 лет назад

DBSCAN: Part 1

12:13

Gaussian Mixture Models for Clustering

Просмотров 90 тыс.5 лет назад

Gaussian Mixture Models for Clustering

Understanding Irreducible Error and Bias (By Emily Fox)

6:27

Understanding Irreducible Error and Bias (By Emily Fox)

Просмотров 7 тыс.5 лет назад

Understanding Irreducible Error and Bias (By Emily Fox)

Python Libraries for Machine Learning You Must Know!

4:40

Python Libraries for Machine Learning You Must Know!

Просмотров 1,9 тыс.5 лет назад

Python Libraries for Machine Learning You Must Know!

12:41

Conditional Probability

Просмотров 1,4 тыс.5 лет назад

Conditional Probability

@homeycheese1 8 дней назад
will coordinate descent always converge using LASSO even if the ratio of number of features to number of observations/samples is large?
@muhammadaneeqasif572 18 дней назад
amazing great to see some good content again thank yt algorithm keep it up
@stewpatterson1369 21 день назад
best video i've seen on this. great visuals & explanation
@pnachtwey 22 дня назад
This works ok on nice functions like g(x,y)=x^2+y^2 but real data often looks more like Grand Canyon where the path is very narrow and very windy.
@sELFhATINGiNDIAN Месяц назад
No
@kacpersarnowski7969 Месяц назад
Great video, you are the best :)
@frielruambil6275 Месяц назад
Thanks very much, I was looking for such videos to answer my assignment questions and you answered all of them at once within 3 minutes. I salute you,please keep on do more videos to assist the students to pass their exams and assignments.
@NeverHadMakingsOfAVarsityAthle 2 месяца назад
Hey! Thanks for the fantastic content :) I'm trying to understand the additivity axiom a bit better. Is this axiom the main reason why Shapley values for machine learning forecast can just be added up for one feature over many different predictions? Let's say we can have predictions for two different days in a time series and each time we calculate the shapley value for the price value. Does the additivity axiom then imply that I can add up the Shapley values for price for these two predictions (assuming they are independent) to make a statement about the importance of price over multiple predictions?
@somerset006 3 месяца назад
What about self-driving rockets?
@paaabl0. 4 месяца назад
Shapley values are great, but not gonna help you much with complex non-linear patterns, especially in terms of global feature importance
@williamstorey5024 4 месяца назад
what is text regression?
@yandajiang1744 5 месяцев назад
Awesome explanation
@user-vh9de5dy9q 5 месяцев назад
Why are the given weights for the distributions, are not really showcasing the distributions on the graph. I mean i would choose π1 = 45, π2 = 35, π3 = 20
@thechannelwithoutanyconten6364 5 месяцев назад
Two things: 1. What the H matrix is has not been described. 2. One non s1x1 matrix cannot be smaller or greater then another. This is sloppy. Besides that, it is a great work.
@obensustam3574 5 месяцев назад
I wish there was a Part 3 :(
@DenguBoom 5 месяцев назад
Hi, about the sample has X1 to Xn, do X1 and Xn have to be different? Because you have a previous sample of 100 height from 100 different people. Or it can be like we treated in bootstrap that X1* to Xn* can be drawn randomly from X1 to Xn so basically can draw same height of a single person?
@feriyonika7078 6 месяцев назад
Thanks, I can more understand about KF.
@usurper1091 6 месяцев назад
7:10
@lingfengzhang2943 7 месяцев назад
Thanks! It's very clear
@user-uk2rv4kt8d 7 месяцев назад
very good video. perfect explaination!
@sadeghmirzaei9330 7 месяцев назад
Thank you so much for your explanation.🎉
@laitinenpp 7 месяцев назад
Great job, thank you!
@SCramah13 8 месяцев назад
Clean explanation. Thank you very much...cheers~
@felipela2227 8 месяцев назад
Your explanation was great, thx
@vambire02 8 месяцев назад
Disappointed ☹️ no part 3
@Commonsenseisrare 9 месяцев назад
Amazing lecture of gnns.
@cmobarry 10 месяцев назад
I like your term "Word Algebra". It might be unintended side effect but I have been pondering it for years!
@rakr6635 10 месяцев назад
no part 3, sad 😥
@vgreddysaragada 10 месяцев назад
Great work..
@boussouarsari4482 11 месяцев назад
I believe there might be an issue with the perplexity formula. How can we refer to 'w' as the test set containing 'm' sentences, denoting 'm' as the number of sentences, and then immediately after state that 'm' represents the number of all words in the entire test set? This description lacks clarity and coherence. Could you please clarify this part to make it more understandable?
@GrafBazooka 11 месяцев назад
i cant concentrate she is too hot 🤔😰
@sunnelyeh 11 месяцев назад
this video represent meaning that F/A 18 has capability locked UFO!
@thefantasticman 11 месяцев назад
hard to foucus on ppt can any one explain me why ?
@nunaworship 11 месяцев назад
Can you please share the link for the books you recommended!
@AoibhinnMcCarthy Год назад
Hard to follow not concise.
@jcorona4755 Год назад
Pagan porque vean que tiene más seguidores. De echo pagas $10 pesos por cada video
@g-code9821 Год назад
Isn't the positional encoding done with the sinusoidal function?
@homataha5626 Год назад
Hello, Thank you for sharing. Do you have the code repositiry? I only learn after I implemented it.
@MachineLearningTV Год назад
Unfortunately, no..
@because2022 Год назад
Great content.
@robinranabhat3125 Год назад
Anyone. at 31:25, shouldn't the final equation at bottom-right be about minimizing the loss. think that's a typo.
@Karl_with_a_K Год назад
I have run into token exhaustion while working with GPT4 specifically when it is giving programming language output. Im assuming resolving this will be a component of GPT5...
@yifan1342 Год назад
sound quality is terrible
@nehalkalita 10 месяцев назад
Turning on subtitles can be helpful to some extent.
@majidafra Год назад
I deeply envy those who have been in your NN & DL class.
@josephzhu5129 Год назад
Great lecture, he knows how to explain complicated ideas, thanks a lot!
@chris-dx6oh Год назад
Great video
@ssvl2204 Год назад
Very nice and conscise presentation, thanks!
@zhaobryan4441 Год назад
super super clear!
@lara6893 Год назад
Emily and Carlos rock, heck yeah!!
@StratosFair Год назад
Great video ! Are you guys planning to upload follow up lectures on this topic ?
@StratosFair Год назад
Where is the video on recursive least squares though ?

Machine Learning TV

Видео

Комментарии