AI-900 - Learning About Generative AI

rw-book-cover

hey everyone this video is a supplement to my AI 900 study gram because they've introduced a whole new section about generative Ai and so I thought it made a lot of sense to actually go and create a module specifically to that generative AI so now you have the complete information to help go past the exam now we're used to the idea of regular artificial intelligence so if I think a regular

AI well to a large part its goal is to imitate some aspect of humans a human behavior this could be the ability to recognize speech it might be the ability to see an image and classify it it might be the ability to translate languages that's what we covered in the study cram but it's imitating some aspect of human

beings now if we now think about this brand new field generative AI this instead switches the focus to its ability to create original content now we might argue well what is really original in this means it's been

trained on a set of content that it I guess in a way has has learned from and then it's based on but you think about human beings we read a whole bunch of books we've seen a bunch of different things that helps build the connections in our brains it's it's the same idea here it's trained on this huge amount of information and it generates new content I grew up watching 40 towers and Black Adder that's had a huge influence on my type of what I think is funny in the things I say so it can create original

content and this original content could be in the form of natural language I can absolutely Converse I can give it natural instructions it can respond I can ask it will answer it can summarize it can create it can create code

in many different languages hey can you write me a function how would I do this this isn't working can you help fix it what is this code doing write a test case for this piece of code and also images so it's all about the idea that it can create this original content compared to AI which which was just

imitating some specific behavior and if you think about this opens up so many new aspects of things we can do now when we think about this generative AI it's really focused around again you'll hear this term of a large language model you'll hear things like GPT um llama there's many different ones around this but you have some large language model and what we do is we

interact with it using natural language and this natural language we give it is known as The Prompt and this is really important and we'll come back to this but there's different qualities that we can give how we actually phrase and format that prompt that will really say how well the response this can give

and so that's the whole point of this I feed in some prompt to this large language model and what it's really doing then is it's predicting the next word specifically it's a token but it's going to predict the next word I'm going to write word but it's really a token and then it feeds that

in and then it predict predicts the next word and the next word and the next word until fin it gets something called an end of sequence hey I'm complete this is all I should do and you'll also hear this called an inference inference so I've trained the model and then it can predict the next word the next word the next word until it gets to the end of sequence and that's its job done it answer said it has done

everything it believes is required to complete the prediction based on that prompt it was given and that's everything it's doing it's always predicting the next token then the next one next one until it thinks it's done now if you think okay this looks magical these fantastic large language models well how is this actually created and the whole point here is for this to work is there's a whole amount

of training this doesn't just magically appear so there's a whole set of training now this training consists of huge amounts of data now different models have been trained on different data but the biggest models today are based on information called from the web from Wikipedia from sets of books from other libraries massive amounts of data

that is just fed into this model now this then means to process this because I did a whole Deep dive into how these models work it has to create parameters which are weightings based on all the different connections between the new ones it has available well that takes a massive amount of computational typically we hear about gpus every time we hear about generative AI we talk about oh we need more gpus because G pus are fantastic at

this parallel type Computing which we can do as part of the training which takes a huge amount of time and then we also need then once we have all of this and we've trained the model to then do these inferences so what this training does is it basically builds up and I'm obviously making this very very simple but all the connections the neurons

within this neural network which is the large language model and if I think about these parameters which hey for every connection there's a certain weight that builds into might be biases as well values that are adding and removed well these biggest models today they have billions and even trillions

of parameters so these things are huge so all of this goes in huge amounts of time to train the model and then once the training is complete that's when we get our final large language model and I guess I should really have drawn this in the the nice sparkly pen R LL M that's the final large language model that we then have in our environment and the whole point here is once it's

trained it doesn't really change it is essentially read only at that point and when you think about for a second I talked about these number of parameters you might say why is he going on and wanding around parameters the more parameters you have it scales pretty evenly with no

known limit at this point it's ability so the bigger I can make the model the new network the better it is and that's what you're seeing today they keep making them bigger and bigger with more and more data and it's just getting more and more intelligent if you think about when I when you're a baby the brain is so big and then as you get bigger the brain increases we learn more we get more information coming in we get smarter and smarter well that seems to apply to what we're

seeing right here and so once we have this we have this large language model as we talked about it then does all of these great things we have this idea that I can for example summarize text and you'll see examples that you feed in a meeting you feed in a book summarize this what what happen happened in this it has the ability to

generate new hey write me a story about write a two-page paper on whatever it might be I can compare text hey look at these what are the similarities between them and it really just list goes on and on around natural language and asking it to do things it's very very capable but it's not what you call a g a general artificial intelligence um or AGI and

artificial general intelligence it can't learn new things like they're terrible at math if you try and give it a longer math problem it's it's no good at that it's all focused about predicting the next token it's fantastic at that but it is not this artificial general intelligence the AGI it is really focused around these language type tasks but don't take anything away from that it is phenomenally powerful now when we think about this large

language model it's based on a Transformer model which there was a paper written uh attention is all you need that really pioneered a lot of the structure and the mechanisms we use today around what we see in the modern gpts these other large language models that

we're leveraging and it's really focused around the idea that we have an encoder for the input that then takes that input as some representation that we can then feed feed into a decoder that takes that representation and then generates our answer now if we actually go and look at the paper is probably a good idea so this is that official paper this is where it's talking about okay

attention is all you need and we can actually go and jump down so this is the architecture and what you can see here is the encoder is on the left and then the decoder is on the right and they really are the same except the one on the right has this idea of

taking this output representation and feeding it into its own set of logic which makes a lot of sense it has to understand the output it's producing but then also it has to consider that input and keep bringing that in as it works out okay what should my prediction what should my next prompt be until it runs it through really a soft X which just

gives me a total to the value of one so whatever all my values are I can say which were the biggest most important ones but you'll see there whole sets of different things here around okay there's your input then there's some embedding then there's positional encoding then there's this multi-head attenion and then this feed forward and like what what is all this stuff about and I really want to stress that

you don't need to understand um the details of this it's not going to be in the exam but it is nice to understand kind of what's going on a little bit with what just some of those terms means at least so let's think about what we do with this model like I I drew earlier on the idea that you give it a prompt and it then predicts the next word our prompt is in a language I speak English

badly I speak English and then it's going to go and predict well computers don't really like Words anyway they they're really big on numbers so if I was to think about what's actually happening as part of this complete flow so we have text so I type in let's say it's the prompt which again it is just some

text the first thing it has to do is it breaks it up into tokens now there's a set amount of tokens it supports that that mean different parts of the word but this is just makes it simple um in terms of how it can then map and go forwards and we can actually see this so here's a tokenizer this is on open AI open AI you're going to hear about all the time but the whole point is I can type

something in John was working on his computer until it crashed and what's actually I guess kind of nice here that every one of those words was its own token that will not always be the case um generative there so the word generative was actually two separate

tokens and we can actually then go and see the actual IDs so I type in a sentence and it then converts it to these tokens so we get these token IDs which is that first step okay great so it creates these Tokens The Challenge we have is that just tokens on own just words in

languages there can be many different words that mean the same thing and words the same word can mean different things depending on the context in which it's used so one of the things it wants to do is it wants to really have some better mathematical representation of the semantic meaning of the words are those tokens so what it actually then does is it runs it through and

embedding model which is really just a fancy way of saying it's going to spit out vectors a vector obviously it has some directions some magnitude and it's going to create a vector that represents those tokens so it's going to be some big sequence of numbers goes on and on and the point of these vectors just to

make it super clear is many words have similarities they mean the same thing like I could say the word boss I could say the word manager and they're really meaning the same thing and if I had the word um cat well it's very similar to kitten and so what these vectors do what they output they will be very close to each

other if they have very similar meaning obviously this picture is in two Dimensions this is 2D these embedding models it depends but like ad to 2 which is a very popular embedding model it will spit out 1536 Dimensions I can't visualize more than three but this does it so if I was to just give you an idea of this if I jump back over

again this is so here I've got a certain input and this idea of embeddings is going to come up again and again and again and again but if I just run this super quickly I'm actually sending this to the Azure open AI service so we can see it ran and I can see this is the vector all of these numbers represent that sentence in a

vector so if I had two sentences that were very similar they would have a very similar Vector so there we can see yep there's 1536 Dimensions to this Vector our brains can't comprehend that but mathematically computers can handle that just fine and then there's the idea that this was a model that was trained specifically to create these embeddings based on the semantic meaning so I'd get similar vectors for similar

things and then what we're going to see later on is I can then go and search for sayable how close are these vectors like how close are those lines for things to try and find something that's very similar great so now we have this Vector that generates it the different models will support different numbers of tokens actually I should have pointed that out earlier so now we've got this vector that represents the semantic

meaning but then if I think of a typical sentence John eats Burger that's very different from the sentence Burger eats John there's all these jokes around punctuation and getting the words in the right order but the positions of the words matter which is why you then saw it talk about taking these vectors and it adds in this concept of

positional encoding and it's really clever the way it does this it has sort of cosine and uh sine waves with different frequences that it merges in as part of the vector to get this positional encoding okay great so now we have the position encoded as part of this Vector that represents it now we actually get into the main Crux

of what the model actually does and you saw it had this idea of multi-headed attention and self attention and Mast self attention because one of the challenges is as the input gets bigger and bigger and the output gets bigger and bigger I need to make sure I don't forget about stuff that's earlier on for example if I said don't give John Green things to eat well if I forgot the word don't that's

now means something very different and I'll be very very upset when you give me dinner so I can't forget about things and so then the whole point of what it now does and this becomes a bigger part of a lot of what these models are doing it has this idea of self attention and this is the idea that I could say [Music]

John was um using a computer until it crashed clearly it was a Linux computer joke don't get upset and what it will do is it will go through now when it's masked self attention it only Compares itself to itself and the words before so John

could only look at John was could only look at was and John using could only look at using was and John but the time I got the word it well it would have a strong attention and relationship to computer like that's what's crashing it so it would have a much stronger connection to computer than John hopefully John is not the one crashing or I've got different issues going on and so that's important and you can

think about if this went further back that when it came to process the word it it would also pay a lot of attention to the word computer as it was processing when it was something crashed W crashed would have a big relationship to computer as well whereas using would have a strong relationship to John so it has this idea of ATT tension between the different things and to do this it has this IDE idea of it creates these query

values a key value and then an actual value and think of the query as a value for each word that it's then going to go and compare against the key value of all of itself and the words before and that really tells it how strong is that relationship and then the I it's the dotted sum of those then multiplies by the value to get its final attention score so it uses these and then finally

once it has that it goes into the feed Ford Network which is the bulk of the whole neural network that's when it actually now goes and does work and starts hey what what should come out of this model based on all those parameters and what ultimately you end up with is this representation also we might hear a context vector

vector and what I should point out is this part here this is repeated a whole bunch of times number of times it's repeated does vary but that's really the the bulk of the model and let me shrink this down a little bit so you can see the complete thing so that's all of it you have the prompt gets turned into tokens gets turned into an embedding a vector it has the position encoded into it it works out the self attention masked self attention if it's maybe only using a

decoder only pattern which works out hey what parts I need to pay attention to goes through the bulk of the neural network which ends up with this representation again if we went back to the paper for a second that's exactly what we saw here hey look uh we had the input we worked out the embedding from the token we encoded the position we worked out the attention and then we've set it through the feed

forward now many models today don't actually use the encoder and the decoder for example GPT only uses the decoder part which is why it's this masted multi- header tension and the inputs are fed in as well as the same Parts these outputs that would feed back through you'll also notice this entire structure here is repeated a certain number of times so this is the key point about these

models and once again this is not something you need to memorize the detail of but they do talk about it in the text in the materials and if you were considering translation for example if this was English to Spanish you could kind of think about what would end up out of the encoder would be a language neutral representation of what I type in and that language neutral representation

could then be fed into the encoder to Output the Spanish version of it that that's why this is useful but in the idea of just predicting the next word what they found is I really don't need to do this it's easier to train if I'm just using one of them it's cheaper it's faster and so again a lot of the models like I think it's a b from Google only uses the encoder but GPT and a lot of the others just use the decoder and the those inputs would kind of feed in that direction as

well don't need to know all of this but it's just been brought up uh in the papers and I think it's fascinating like this is a really cool thing to understand how it's actually powering and how it handles these huge context you'll hear about token limits and I'm going to talk more about that but this is how this is powered this self attention and this ability for it to know what its relationship is to other things that preceded it is what enables it to keep the context to not forget that it was don't give

John Green things to eat which if you remember one thing from this is don't give John Green things to eat I'll be very sad okay so that's the fundamentals of these Transformer models that power everything everything is really just using this uh maybe different parts of it they tune it the parameters the the input they may give but this is the

large language model but as I talked about remember this is very expensive in time and GPU not just to train but then the inferences to go and complete what I type in which is why you will see there's a lot of investment into well can I create a small language model that's maybe focused on different tasks but that is cheaper faster to train which means I can tune it more easily and then hey to get my results it it's

cheaper to actually leverage there is a lot of work going on around there so then let's talk specifically around GPT and you can't really talk about GPT without first talking about open AI so open AI is a company and they have done huge amounts of work in different AI models this

generative AI but the big one we're focused on here is GPT and the whole point of the GPT it is this generative I it's creating it's been pre-trained so it's been trained on all of that data and it is a Transformer

every time I see the word Transformer I think of I shouldn't have done that uh but the nice 80s cartoon that I grew up with so this is GPT it's what they've trained and there are many different models I think it was 35 was the first maybe they at least a three but 3.5 is a common one you'll hear you'll hear a lot about four and you'll hear four turbo and if you think a big part of these

models remember I talked about well the number of parameters is how powerful they get so from 35 to four they increase those number of parameters again four is trillions but also remember this idea of the tokens I can feed in tokens as the input and then how many tokens it can use for the output it it's memory in a lot of ways the bigger the number of tokens the context some the more useful it is hey I can feed it in more information hey it's allowed to generate

more text as the output and so when I look at these models I think it was something like 3.5 I think was 4,000 token size but I think there was also a 16,000 token version four was 8K and then I think there was a 32k and then the four turbo which is recently released is 128k now just because it might have this value doesn't mean that value can be

used for the input and the output I think turbo is still only 4,000 for the output if we go and look at the models it may confirm that so here's all the different models from open AI yep so this is Turbo we can see the context window is 128 ,000 however return the output is

4,096 so that's really it's still really useful I can feed it in maybe whole books for example but it can only output 4,000 tokens out of those and then here's the 35 then some of the other models and Dary Etc so I can see the details around those various things and so this just makes it more and more

capable that long the bigger that memory is the more things it can do with more data they have now the other thing they then created so GPT is the generative AI model that can predict the next token the next token the next token next token then what they created was chat GPT now chat GPT takes this GPT model but what they then

did is they did additional training for interaction so they they took it further for interactive dialogue to make its Behavior align with hey what we expect for a typical user interaction so they had a whole bunch of supervised training they gave it huge numbers of examples of what the user types in and then what the agent should respond with it is the agent and then it would score results

that it gave out to just keep tuning Its Behavior so that is what chat GPT is it's the GPT model but then they went and tuned it specifically for those interaction type scenarios okay that's the open AI company well this is a a Microsoft certification so how doeses this play in the Microsoft

world so now we have Microsoft now Microsoft are a huge partner of open AI I think they own a big chunk of I think I saw something like 49% they also provide all the data center infrastructure that openai uses to do the training they provide these basically supercomputers with all the gpus that openai uses to train the models that makes this possible and they host the services that open AI obviously

offer their own services to customers to go and use these for their applications they expose those as an API Microsoft then leveraged these Technologies so what we can think about they actually use them in a number of different ways so if openai train and create these large language models and they make it available in their own Services what Microsoft have done is well they take a copy of the model so a key Point here they are not using the

open AI instance that is then used to do inferences they copy the model it's no network in their own data centers their Azure Cloud their own instances within their own regulatory requirements of the different products there's lots of instances of this large language model so Microsoft have their own instances running in their own regulatory trust boundaries there's there's lots of the and they use them in different ways now

the first way you've heard of this you you've heard of Microsoft co-pilots I don't think it's possible to not hear the term co-pilot and you can really think of the co-pilot in a in a way as an orchestrator and what I mean by that is it's orchestrating between the user using word or teams or or the security dashboard or Dynamics or just the web or their Windows 11 machine and what happens is the user has

some request so the user creates their original prompt hey help me do something but remember I talked about the whole idea of the quality of the prompt and the quality of the prompt is really really important in fact maybe should come back to that for a second before we go and talk about this so if I talked about that prompt over here there's a whole area of study

and work around prompt engineering and this is critical because the quality of the prompt will drive the quality of his ability to respond for example we think about be explicit be very act with what you want it to do we think about telling it

how it should act and there's different ways in which it can work there's something called zero shot so zero shot is no examples of how you want it to behave fuse shot is hey this is what the user would type in

this is how you as the agent this is how you should respond there's also ideas in this prompt engineering around grounding and grounding is the idea that there's some additional data source and you can bring in data from that to add to your prompt hey summarize all the emails from my manager the grounding would go and get all the

emails from your manager because the large language model has no access to your data and then append that to the prompt and then send it to the large language model so I'm grounding my prompt my request in actual data so it can do the task hey summarize the last meeting I had well the grounding would go and get the transcript from the last meeting you had and add it to that request so now the large language model can go and do that so prompt engineering is all about the science of making the prompt better how we can improve what

we're trying to get it to do so now if I come back to the co-pilots understanding this prompt engineering well the co-pilots are in some context of a particular application teams word the security dashes Dynamics being whatever that is so this copilot orchestration well it's responsible for

taking what the user is requesting but then doing that grounding remember the grounding is hey there's some other data now that data could also be via certain apis but for example if I was Microsoft 365 well then I'm using the Microsoft

graph if I'm being I'm using my search index of the internet if I'm a security co-pilot then once again I'm looking at the Microsoft graph but I'm also looking at Sentinel data and other apis and functions I can call into the whole point is it's taking the prompt the user gives it works out

what other data is going to be required to do a useful job and it then creates this meta prompt that it actually goes and gives to the large language model which then can return a quality response because it's got the right data and then it can go and respond to the user and it may even be the co-pilot tells the large language model hey here are the apis I have available the large language model may actually say hey go

and run these commands for me because the co-pilot has the permission on behalf of the user large language model has no access to anything go and run this for me and then give it back to me to help me do the job and very often the co-pilots will use the large language model to work out what is the data I want but the net result of this is the co-pilots help me do the job they help accelerate the stuff I'm doing they help me if I'm stuck and don't know what my next step should be and I can

really think of the co-pilots if I was to think of this as regular Services as really a SAS type solution it is a generative AI as a complete service it performs the function there's nothing I have to do as the user it just just does it so if we were to actually go and have a look at one of these and Bing is the easiest one to do I can ask it to do things so here I could ask it questions what

are three services of azure open Ai and what we'll see is to answer this question notice it's grounding itself in Internet results so it's doing a search against the Bing index of the internet and will then bring that knowledge in and it's telling me what it's referencing so here it's showing me the articles that it's using from the web to go and create this output it's multimodal in the here we

can see I can add an image and it could describe it to me that's one of the nice things in gp4 it's suggesting future questions it integrates with DAR 3 which is an image generation service so I could say um create a picture of a Bard English man sitting on a cloud with a

laptop so it's realizing hey what I'm asking it to do it can go and hook into the D3 engine and now it can go and generate my image for me so it's doing that image generation but I'll be a to continue interacting with that so let's see what it generates I could ask it to modify in certain ways that's one of the things darly 3 can do it can modify images generate oh there we go yep that looks exactly like me there we go oh I

like the the that one's brilliant yeah she looks a little bit like me so there we get this idea of hey it can generate things I could ask it to write code so there used to be separate models for code generation there aren't create a Powershell script to generate Pi to 10

digits and it can do other languages as well and then it's going to go and create that so we can see what those are doing so based on its grounding it does those various things which is great I'll let that carry on so that's that kind of SAS version but maybe I want to write my own applications as well so in addition if I think about Azure Services what Microsoft also created was

those large language models over here it also took copies again there's multiple different copies of this but this time it's put it into its Azure cloud service so now we have the idea and the service of azure open AI now there are other AI services

available in Azure they support other models from other companies it's not only Azure open aai but from openai they support the GPT they support the uh embedding models and I think it's in preview right now but they also support the dolly so the whole point is I can now leverage this now the way we leverage this is we have the

Azure open AI studio now the first thing we would do is we would create an instance at the Azure open AI service and then in the Azure open AI Studio what we would actually do is we will deploy an instance of a model so now we have an instance of the model over here that we

can use and then one of the nice things we can do is there's a playground in this studio that we can experiment so we can try things actually in it and then when we're ready this is exposed as an API well our application can just hook in and use it and that's the whole point so if I was to jump over actually to Azure for a second oh

it's still going we can close all these down there except that picture I think I'm going to keep that CU that really looks a lot like me that's actually kind of freaky all right so in aure I can go to Azure open Ai and I can create an instance now based on the region I create in there are currently different services available in different regions you'd want to check exactly which region you want to create it in and then obviously there's the pricing tier because like everything else you

pay based on usage so just creating the service doesn't cost you any money and then you pay for hey for my prompt for every th000 tokens it's a certain amount based on the model and then the completion it's inference based on the number of tokens so I've got the 3 five turbos the context sizes vary gbt 4 then there's fine-tuning models there's the image model there's the embedding model which

we saw earlier so it's just it's based on the amount I use it like everything else but I've created a couple if I go and look at my East 2 I could jump straight away to the Azure open AI Studio but it's here it will show me the keys and the Endo I would interact with it from my application and we'll see howy it's showing me the

endpoint and then there's two keys and I can regenerate them if they were compromised in any way but then to actually use it I jump into the Azure AI Studio here you can see I've just got GPT 4 CU that was in the East 2 but if I went and switched over to my other instance in East us well here you can see I've got an embedding model ad to2 and GPT

3516 and then we can play around so in the playground over here if it's the GPT model it's it's a chat if it was an earlier model then it would be a completion behave in the chat and I can just start asking it to do things but the whole point here here is there's different templates for how you want this to act so these are the system instructions to

the model of what you want it to actually do so I could say hey I want you to be a marketing writing assistant or whatever it is and all it's going to do is change the system message that gives it its instructions I could add few shot examples of hey what the user would type in and how the is assistant should respond so I can give it a lot of guidance on this I'm just going to go

back to default and then I can just start chatting with it I can do all of those exact same things so I could say I can that's giving me some hints um how would I make a cherry pie and you can see it's running against my specific deployment I've got over here I can sa things like hey the maximum response size based on number of tokens how creative it can be how random

it can be so I can tune a lot of the different aspects to this um who is John savle no idea what's going to say here no doesn't know where I am so there's different things it can do again this would be saying I could ground it I could go and maybe ground it in an internet search or database or my HR System this is a private work thing and it can do all of the same points I

can do and this actually is an interesting point so obviously it didn't have data to answer that question who is John savvo so one of the big things you're going to do in a lot of your applications CU you probably going to want it to do more than just the standard stuff it actually knows about is I would go and actually do that grounding in some data that I have now Microsoft

actually provides services to help you do that there's something called the semantic kernel think of that as an orchestrator that can then go and actually go and call in the large language model what this can do is hook into Services now a huge service is azure AI search this was Azure cognitive search before but if I have data in Blob or

databases or data lakes and other things what it does is it creates those Vector embeddings that represent the data and then it can send a query to it that it will create the vector embedding of and then find out which data it's closest to to return the most relevant data that it can then add and feed into the prompt to to give you the best response uh postgress SQL flexible

Cosmos DB from mongod DB vort they also have Vector extensions now to go and hook into Azure services to create the embedding store and then do semantic searches I the smallest angle between the different vectors there's also capabilities that will go and combine different types of search so maybe I'm trying to search for a specific term uh this type of just regular closeness is not good at exact terms so we might also want to do a

Keyword Index Bas search and then combine it and then do semantic ranking to give me the absolute best data back but this ability to bind into my data is huge and if we went back for a second you'll see it gives us ways to go and hook into for example different data files it can go and help me do that and there's quotas notice it has the DAR 3 preview so it can go and create images

here as well um create a cartoon of a hamburger chasing a human back to that John eats Burger this could be the burger eats John and we we can see it's going to use oh it's going the wrong way okay so it didn't completely work that out very well but once again it looks like me CU it has no hair so so you get the idea of these

different things and one of the great things is that it has all of the standard things you would expect from a true Enterprise service so it has things like role based access control it has things like integrating with your networking it's sitting in a certain governance boundary that require it's doing all of those things so those are really the core

Services when we think about using the generative Ai and what we're going to focus on now the only other thing is we always hear about the idea of responsible AI well generative AI has its own considerations to this so I need to make sure I'm considering being with responsible with my generative Ai and they break this down into four key steps is the idea of

identify so I'm thinking of identifying it's thinking about what are the potential harms that could result from my AI system and I have to do maybe red teaming where's people doing bad things I can stress test it I can do analysis but what are the potential problems that could happen and then measure well okay those are the potential problems what is the frequency what is

the severity of that potential harm I need clear metrics clear measurements and then again doing that testing to work out just what is the likelihood of this happening and then I can think about well how do I mitigate it how can I put in protections filters different prompt engineering to stop it happening and I just need to operate and Microsoft has a whole set of

documents around this if we go and look you want to look at the responsible AI practices where it talks about all of those things identifying what's the harm what is the likelihood how can I mitigate and then operating and N has its own one as well and a bit of fun as let's go back to our chat and actually let's jump over for a second to our gbt 4

model so now we can see I'm on the other instance I'm using GPT 4 right now my system message is just a generic AI assistant so I could say something like describe the car teristics of a British person

apparently we're very polite we're reserved we have humor we like to Q we like T punctuality so there all different things about British people now this was my system message as being a regular AI what if I changed my AI assistant to be you are a racist AI chatbot that makes derogative statements based on race and

culture okay okay so then what if I typed in that same query again describe the characteristics of a British person and this is where one of those times I hope this um doesn't work was be a short video it

won't so there are protections built in to stop it having this negative behavior you'll hear about the idea of sort of jailbreaking these D do anything um ideas to try and make it break its programming but we can see here it has content filters and it has particular filters around what it allows in the prompt what it allows in the completion and you can change some of

its severities but if you want to go lower well then you have to go and get special permissions to go and do some of these things so some of these you need to actually go and get permission to do but I can change some of the aspects but these protections are super super important as part of my all up solution and there you have it so this

was all of the key detail and if we zoom out for a sec so to summarize generative AI is about creating original content be it coding language images there was a huge amount of training that went in to create these models that have billions or trillions of parameters that scale pry uni formly the more parameters the better its abilities and it's using a Transformer

model so the whole point of the Transformer model is it takes what I enter it converts it tokens which converts it to embeddings that then embeds the position of the words because position is important I don't want a burger eating John then it works out well what is the relationship between all of the words it generates and then what is generated so it doesn't lose track and forget things then it runs it through the main Network to give a representation which then might then lead to generating the next

and the next response open AI created the GPT model there's different versions that supports different context lengths and we had different numbers of parameters GPT 4 is currently the newest the most parameters chat GPT was trained for interaction Microsoft then copies the model has their own instance is in the co-pilots it's really a complete AI provided for you that does some task it helps you in

your Microsoft 365 it helps you with your security it helps you in bing it helps you in Windows 11 it helps you in Dynamics Etc there's a co-pilot for everything because it's grounded in data specific to that particular service that it's it's leveraging but it's provided as a complete AI solution we can also stand up instances using Azure open AI different models that we can then use in our apps we manage those in Azure open AI Studio that lets us actually play

around with different prompts that I might leverage until I get it right when I've got it right I can then use it through its API from my application there's other services like the semantic kernel that provide that orchestration to hook in to searches like Azure AI search is really good for natural language because hey there's these different words can mean the same thing and the same word can mean different things based on the context so these Azure AI search create

those embeddings that represent the semantic meaning of the word or the phrase or the image and that's really powerful then I can then search to get relevant data to any request I'm making that's exactly the same thing that the co-pilots do so that is generative AI That's what you need to know um especially for AI 900 um as always Don't Panic take your time till next video take

care