IAsyncEnumerable performance benefits | Damirs Corner

2024-02-09

Asynchronous streams (i.e., the IAsyncEnumerable interface) were one of the new features in C# 8. However, it didnt get as much attention as some others. Even today, the feature is rarely used and not well known among C# developers, although it can make the code faster and easier to understand in some cases. Nevertheless, classes in the .NET base class library regularly expose the IAsyncEnumerable interface when it makes sense. The JsonSerializer class, for example, can return it when deserializing a JSON array. This can be particularly useful when you need to deserialize multiple arrays. Without asynchronous streams, if you didnt want the consuming code to know how the data is loaded, you would load it from all the files into a list and return that: public static async Task<List<TimeValue>> LoadAsListAsync(string folder) { var files = Directory.EnumerateFiles(folder, *.json); var allItems = new List<TimeValue>(); foreach (var file in files) { using var stream = File.OpenRead(file); var jsonItems = await JsonSerializer.DeserializeAsync<List<TimeValue>>( stream, jsonSerializerOptions ); if (jsonItems != null) { allItems.AddRange(jsonItems); } } return allItems; } This approach isnt very efficient for large data sets, especially if youre only interested in aggregated data and dont need to keep the individual items. The following method is an example of such a consumer, only calculating the daily sums: public static async Task<Dictionary<DateOnly, int>> AggregateFormListAsync( string folder ) { var aggregates = new Dictionary<DateOnly, int>(); var items = await JsonLoader.LoadAsListAsync(folder); foreach (var item in items) { var date = DateOnly.FromDateTime(item.DateTime); if (!aggregates.TryGetValue(date, out var aggregate)) { aggregate = 0; } aggregates[date] = aggregate + item.Value; } return aggregates; } Without using asynchronous streams, you could change the loading code to only load one file at a time and never need to hold all the data in memory: public static IEnumerable<Task<List<TimeValue>?>> LoadAsTaskList(string folder) { var files = Directory.EnumerateFiles(folder, *.json); foreach (var file in files) { using var stream = File.OpenRead(file); yield return JsonSerializer .DeserializeAsync<List<TimeValue>>(stream, jsonSerializerOptions) .AsTask(); } } However, this requires the consuming code to be aware that the data is loaded in multiple parts. Because of that, two loops are required to process all the data: public static async Task<Dictionary<DateOnly, int>> AggregateFromTaskListAsync( string folder ) { var aggregates = new Dictionary<DateOnly, int>(); var tasks = JsonLoader.LoadAsTaskList(folder); foreach (var task in tasks) { var items = await task; if (items != null) { foreach (var item in items) { var date = DateOnly.FromDateTime(item.DateTime); if (!aggregates.TryGetValue(date, out va

Link [ https://www.damirscorner.com/blog/posts/20240209-IAsyncEnumerablePerformanceBenefits.html ]

Previous Article

IAsyncEnumerable performance benefits | Damirs Corner

2024-02-09

Asynchronous streams (i.e., the IAsyncEnumerable interface) were one of the new features in C# 8. However, it didnt get as much attention as some others. Even today, the feature is rarely used and not well known among C# developers, although it can make the code faster and easier to understand in some cases. Nevertheless, classes in the .NET base class library regularly expose the IAsyncEnumerable interface when it makes sense. The JsonSerializer class, for example, can return it when deserializing a JSON array. This can be particularly useful when you need to deserialize multiple arrays. Without asynchronous streams, if you didnt want the consuming code to know how the data is loaded, you would load it from all the files into a list and return that: public static async Task<List<TimeValue>> LoadAsListAsync(string folder) { var files = Directory.EnumerateFiles(folder, *.json); var allItems = new List<TimeValue>(); foreach (var file in files) { using var stream = File.OpenRead(file); var jsonItems = await JsonSerializer.DeserializeAsync<List<TimeValue>>( stream, jsonSerializerOptions ); if (jsonItems != null) { allItems.AddRange(jsonItems); } } return allItems; } This approach isnt very efficient for large data sets, especially if youre only interested in aggregated data and dont need to keep the individual items. The following method is an example of such a consumer, only calculating the daily sums: public static async Task<Dictionary<DateOnly, int>> AggregateFormListAsync( string folder ) { var aggregates = new Dictionary<DateOnly, int>(); var items = await JsonLoader.LoadAsListAsync(folder); foreach (var item in items) { var date = DateOnly.FromDateTime(item.DateTime); if (!aggregates.TryGetValue(date, out var aggregate)) { aggregate = 0; } aggregates[date] = aggregate + item.Value; } return aggregates; } Without using asynchronous streams, you could change the loading code to only load one file at a time and never need to hold all the data in memory: public static IEnumerable<Task<List<TimeValue>?>> LoadAsTaskList(string folder) { var files = Directory.EnumerateFiles(folder, *.json); foreach (var file in files) { using var stream = File.OpenRead(file); yield return JsonSerializer .DeserializeAsync<List<TimeValue>>(stream, jsonSerializerOptions) .AsTask(); } } However, this requires the consuming code to be aware that the data is loaded in multiple parts. Because of that, two loops are required to process all the data: public static async Task<Dictionary<DateOnly, int>> AggregateFromTaskListAsync( string folder ) { var aggregates = new Dictionary<DateOnly, int>(); var tasks = JsonLoader.LoadAsTaskList(folder); foreach (var task in tasks) { var items = await task; if (items != null) { foreach (var item in items) { var date = DateOnly.FromDateTime(item.DateTime); if (!aggregates.TryGetValue(date, out va

Link [ https://www.damirscorner.com/blog/posts/20240209-IAsyncEnumerablePerformanceBenefits.html ]

Copyright © 2024 All rights reserved

Rss

Atom