Blog

AI August 4, 2025 4 min read Anup Marwadi

Why Streaming AI Responses Break Token Tracking (and How to Fix It in Semantic Kernel)

Streaming Responses from LLMs do not accurately give you usage metrics and token counts. Here's how you can work around that.

Streaming improves UX but makes token usage invisible. Here’s how we fixed it using C# and Semantic Kernel.

Summary

If you’re building production-ready AI applications with Semantic Kernel and using providers like OpenAI, Gemini, or Claude, there’s a hidden problem when you enable streaming: token usage data disappears.

This is a major issue if you’re tracking consumption for billing, enforcing usage limits, or analyzing system performance.

At HyperTrends, we solved this problem in our C# implementation using Semantic Kernel by collecting streaming output manually and calculating token usage ourselves. Here’s exactly how we did it.

The Problem: Streaming Removes Token Metadata

When using InvokeStreamingAsync() in Semantic Kernel or other streaming methods, the response is delivered in parts for a smoother user experience. However, the tradeoff is that token metadata is not returned by most LLM APIs in this mode.

For example:

  • OpenAI and Azure OpenAI omit the usage field during streaming.
  • Anthropic Claude behaves the same. Token counts are not returned during a streamed response.
  • Google Gemini / PaLM also does not include token usage in real-time.

As a result, Semantic Kernel cannot populate FunctionResult.Metadata[“Usage”], which is normally available for non-streaming calls.

If you are relying on that metadata for tracking, streaming mode breaks your pipeline.

✅ The Solution: Accumulate the Output and Count Tokens Manually

Since the metadata is not provided, the solution is to reassemble the full prompt and completion, and then count the tokens using model-specific tokenizers.

1. Accumulate the Output

During streaming, append the output fragments to a buffer:

var sb = new StringBuilder();
await foreach (var chunk in kernel.InvokeStreamingAsync(...))
{
    sb.Append(chunk);
}
string fullOutput = sb.ToString();

2. Use the Right Tokenizer

Depending on the model, use a compatible tokenizer. For OpenAI models, you can use tiktoken or a C# port.

var encoding = Tiktoken.Encoding.ForModel("gpt-4");
int promptTokens = encoding.CountTokens(promptText);
int completionTokens = encoding.CountTokens(fullOutput);

Different models use different tokenization methods. Claude and Gemini will require custom logic or official APIs to estimate or calculate token counts accurately.

3. Store and Use the Counts

Once you have the numbers, store them alongside the conversation, use them for billing, or log them for analytics.

Provider-Specific Considerations

ProviderToken Usage in StreamingManual Counting NeededNotes
OpenAI / AzureNoYesUse tiktoken
ClaudeNoYesHas token count API for prompts
Gemini / PaLMNoYesUse countTokens endpoint

Tips for Accuracy

  • Always perform token counting after the full stream completes.
  • Keep track of which model was used so you apply the correct tokenizer.
  • Consider hidden system content like function calls in prompts when calculating total usage.

How Semantic Kernel Handles This

Call TypeToken Usage Available?Recommendation
InvokeAsync()Yes (in .Metadata)Use as-is for full metadata
InvokeStreamingAsync()NoUse custom counting logic

Semantic Kernel does a good job extracting token usage for non-streamed completions, but it cannot extract what the provider doesn’t return. That’s why manual token counting is needed when streaming.

FAQs

Why is token usage missing during streaming?
Because providers optimize for low latency, they skip metadata in streamed payloads. This makes the data fast to deliver but incomplete for logging and billing.

Can Semantic Kernel calculate token usage on its own?
Only in non-streamed mode. For streamed output, you must calculate it manually using a tokenizer.

Is manual counting accurate?
Yes, if you use the official tokenizer for the model. There may be minor differences due to system messages or encoding quirks, but it’s close enough for most billing and tracking scenarios.

Does Claude or Gemini offer token counting tools?
Claude has a token counting API for prompts. Gemini includes a countTokens endpoint to help with estimates. Neither provides real-time token usage during streaming.

Does this work with function calling and plugins?
Yes, but be careful. Plugins and function calls may inject additional hidden content into the prompt. That affects token count, so always tokenize the full content, not just what the user sees.

Conclusion

Streaming AI responses is essential for delivering great user experiences. However, it comes at a cost: token usage data is not available by default.

At HyperTrends, we solved this problem by building a manual counting system using Semantic Kernel, prompt tracking, and model-specific tokenizers. This gives us full visibility into usage across OpenAI, Claude, and Gemini—even when streaming is enabled. Want to build AI SaaS platforms, or Agents to help you grow your business? Talk to us!

Until providers begin returning token metadata with streaming output, this is the most reliable method to keep your application accurate, auditable, and scalable.

Frequently Asked Questions

Can I use PowerBI in a website?







Category:

PowerBI

PowerBI offers a robust Web application that you can view and interact with reports from. However, if you need to use PowerBI from a 3rd party platform, you can always use PowerBI embedding. The pricing structure varies for embedding, please check the PowerBI website for more information.

Can you connect with 3rd party APIs?







Category:

PowerBI

Yes, we connect with 3rd party APIs and pull data into your PowerBI platform on a regular basis. This requires additional custom coding or implementation of 3rd party tools like Zapier or Microsoft’s Power Automate

How do you charge for PowerBI services?







Category:

PowerBI

We offer PowerBI services as a part of our HyperTrends Sense product offering. We usually charge an initial flat-fee for setup and data ingestion/transformation followed by monthly data management fees. Our pricing is simple, predictable and gives you the biggest ROI for your investment.

Anup Marwadi

Anup Marwadi is a technology entrepreneur, an investor and an avid-learner of business skills. He is the CEO of HyperTrends Global Inc. and TicketBlox and is currently involved in numerous advisory positions with Healthcare and Manufacturing companies. Anup is on a mission to build technology products that disrupt industries and help businesses grow by using technology and software as their primary differentiator. Anup is an avid traveler, a speaker and loves fitness and adventure. Anup is a board-member at Entepreneur's Organization (EO) - San Diego.