Microsoft Shifts Copilot to Usage-Based Pricing and Evaluates DeepSeek V4
June 17, 2026 · 5 min read
The End of Unlimited AI Coding Assistance
Microsoft has officially shifted GitHub Copilot Cowork — its GA multi-model agentic coding platform — from flat-rate unlimited usage to per-usage billing. The change comes as Microsoft acknowledged that heavy users were consuming compute resources far exceeding their subscription fees. Simultaneously, Microsoft confirmed it is evaluating a hosted, fine-tuned version of DeepSeek V4 as a cheaper model tier within Copilot Cowork.
These two moves signal a fundamental shift in how developers will budget for AI coding assistance going forward.
How Usage-Based Pricing Works in Copilot Cowork
Under the new model, Copilot Cowork includes a base allocation of compute credits per month with the subscription. Beyond that, usage is metered based on the model used and the complexity of the task. Key details:
Base tier ($19/month): Includes 300 "premium requests" using GPT-4.1 or Claude Sonnet. Standard completions using smaller models remain unlimited. Premium requests cover agentic tasks like multi-file edits, codebase-wide refactoring, and complex generation.
Overage pricing: Additional premium requests are billed at $0.04-0.15 per request depending on the model and context length consumed. Enterprise customers can negotiate volume discounts.
Model choice affects cost: Using GPT-4.1 costs more credits than using a smaller model. This creates a direct incentive to use the cheapest model that can handle each task — which is exactly where DeepSeek V4 enters the picture.
DeepSeek V4: The Budget Option Microsoft Wants
DeepSeek V4 has emerged as a formidable coding model at a fraction of the cost of GPT-4.1 or Claude Opus. Microsoft's internal benchmarks reportedly show DeepSeek V4 achieving 92% of GPT-4.1's coding accuracy at roughly 20% of the compute cost. By hosting a fine-tuned version within Azure, Microsoft can offer it as a first-class option in Copilot without revenue sharing with external providers.
For developers, this means a practical choice: use DeepSeek V4 for routine tasks (boilerplate, test generation, documentation) and reserve premium models for complex architectural decisions or novel code generation. This tiered approach could reduce monthly AI coding costs by 40-60% compared to using frontier models for everything.
Impact on Developer Budgets
The shift from predictable flat-rate pricing to usage-based billing creates both opportunities and risks:
Light users save money. Developers who use AI assistance occasionally — maybe 50-100 premium requests per month — will find the base tier more than sufficient. They were previously subsidizing heavy users under flat-rate pricing.
Heavy users face unpredictable bills. Power users running hundreds of agentic coding sessions daily could see monthly costs spike to $100-300+. Without careful model selection and usage monitoring, budgets can blow out quickly.
Teams need governance. Engineering managers now need to track and allocate AI coding budgets across team members, similar to how cloud compute is managed. Expect new tooling categories around AI usage monitoring and cost optimization.
Strategies to Control Costs
To manage spending under the new model, consider these approaches:
Default to cheaper models. Set DeepSeek V4 or equivalent as your default and only escalate to GPT-4.1/Claude for tasks where the cheaper model fails. Most completions and simple edits don't need frontier intelligence.
Batch agentic requests. Instead of running many small agentic sessions, combine related tasks into fewer, larger prompts. Each session initiation consumes credits regardless of output length.
Monitor and set alerts. Use Copilot's new usage dashboard to set spending thresholds. Engineering teams should establish per-developer monthly caps to prevent bill shock.
Evaluate alternatives. Compare total cost against Anthropic's Max plan ($200/month unlimited for Claude Opus) or direct API access via OpenRouter. For some usage patterns, flat-rate unlimited plans may still be cheaper despite the higher sticker price.
What This Signals for the Industry
Microsoft's move confirms what many predicted: unlimited AI coding assistance at $19/month was never sustainable. As developers use AI more heavily and models become more capable (and expensive), providers must either raise prices or meter usage. Microsoft chose metering. Others will likely follow. The era of all-you-can-eat AI coding is ending — plan your budgets accordingly.
Frequently Asked Questions
How much will Copilot cost under usage-based pricing?
The base subscription remains $19/month with 300 premium requests included. Overage costs range from $0.04-0.15 per additional request depending on the model used.
Is DeepSeek V4 good enough for coding tasks?
Microsoft's benchmarks show it achieves 92% of GPT-4.1's coding accuracy at 20% of the cost. It handles routine tasks well but may struggle with complex architectural decisions.
Should I switch to a flat-rate alternative like Claude Max?
If you consistently use 500+ premium requests per month, a flat-rate plan like Claude Max ($200/month unlimited) may offer better value. Calculate your typical monthly usage to compare.
How can teams manage AI coding budgets?
Set per-developer monthly caps, default to cheaper models for routine tasks, use the Copilot usage dashboard to monitor spending, and establish escalation policies for when to use premium models.
Want to calculate exact costs for your project?
Related Articles
Microsoft-OpenAI Split: What It Means for GitHub Copilot Pricing and Model Independence
Microsoft and OpenAI are preparing to compete directly. We analyze how Microsoft's own models (Phi, MAI) could reshape GitHub Copilot pricing — and whether enterprise plans will get cheaper.
Cursor Adds Microsoft Teams Integration: The AI Coding IDE Pricing War in 2026
Cursor now integrates with Microsoft Teams, signaling enterprise ambitions. We compare Cursor, Windsurf, GitHub Copilot, and Claude Code pricing to find the cheapest option for every team size.
GitHub's AI Capacity Crunch: Microsoft Turns to AWS as Copilot Hits Infrastructure Limits
Microsoft and GitHub face AI compute shortages, turning to AWS for additional capacity. What this infrastructure crunch means for AI coding tool pricing stability and service reliability.