Azure OpenAI Service Provisioned Throughput Units (PTU) Onboarding - Azure AI Services

rw-book-cover

Full Document

Readwise/Full Document Contents/Azure OpenAI Service Provisioned Throughput Units (PTU) Onboarding - Azure AI Services.md

Highlights

You should consider switching from standard deployments to provisioned deployments when you have well-defined, predictable throughput and latency requirements. Typically, this occurs when the application is ready for production or has already been deployed in production and there's an understanding of the expected traffic. This allows users to accurately forecast the required capacity and avoid unexpected billing. ([View Highlight] (https://read.readwise.io/read/01jj83j49xgvssfczvm6vytkzh))

Sizing and estimation: provisioned deployments
Determining the right amount of provisioned throughput, or PTUs, you require for your workload is an essential step to optimizing performance and cost. If you aren't familiar with the different approaches available to estimate system level throughput, review the system level throughput estimation recommendations in our performance and latency documentation. This section describes how to use Azure OpenAI capacity calculators to estimate the number of PTUs required to support a given workload ([View Highlight] (https://read.readwise.io/read/01jj83j9rn23wjk8bb7zgc0r03))