My input is a long list of files located on an Amazon S3 server. I'd like to download the metadata of the files, compute the hashes of the local files, and compare the metadata hash with the local files' hash.
Currently, I use a loop to start all the metadata downloads asynchronously, then as each completes, compute MD5 on the local file if needed and compare. Here's the code (just the relevant lines):
Dim s3client As New AmazonS3Client(KeyId.Text, keySecret.Text)
Dim responseTasks As New List(Of System.Tuple(Of ListViewItem, Task(Of GetObjectMetadataResponse)))
For Each lvi As ListViewItem In lvStatus.Items
Dim gomr As New Amazon.S3.Model.GetObjectMetadataRequest
gomr.BucketName = S3FileDialog.GetBucketName(lvi.SubItems(2).Text)
gomr.Key = S3FileDialog.GetPrefix(lvi.SubItems(2).Text)
responseTasks.Add(New System.Tuple(Of ListViewItem, Task(Of GetObjectMetadataResponse))(lvi, s3client.GetObjectMetadataAsync(gomr)))
Next
For Each t As System.Tuple(Of ListViewItem, Task(Of GetObjectMetadataResponse)) In responseTasks
Dim response As GetObjectMetadataResponse = Await t.Item2
If response.ETag.Trim(""""c) = MD5CalcFile(lvi.SubItems(1).Text) Then
lvi.SubItems(3).Text = "Match"
UpdateLvi(lvi)
End If
Next
I've got two problems:
I'm awaiting the reponses in the order that I made them. I'd rather process them in the order that they complete so that I get them faster.
The MD5 calculation is long and synchronous. I tried making it async but the process locked up. I think that the MD5 task was added to the end of .Net's task list and it didn't get to run until all the downloads completed.
Ideally, I process the response as they arrive, not in order, and the MD5 is asynchronous but gets a chance to run.
Edit:
Incorporating WhenAll, it looks like this now:
Dim s3client As New Amazon.S3.AmazonS3Client(KeyId.Text, keySecret.Text)
Dim responseTasks As New Dictionary(Of Task(Of GetObjectMetadataResponse), ListViewItem)
For Each lvi As ListViewItem In lvStatus.Items
Dim gomr As New Amazon.S3.Model.GetObjectMetadataRequest
gomr.BucketName = S3FileDialog.GetBucketName(lvi.SubItems(2).Text)
gomr.Key = S3FileDialog.GetPrefix(lvi.SubItems(2).Text)
responseTasks.Add(s3client.GetObjectMetadataAsync(gomr), lvi)
Next
Dim startTime As DateTimeOffset = DateTimeOffset.Now
Do While responseTasks.Count > 0
Dim currentTask As Task(Of GetObjectMetadataResponse) = Await Task.WhenAny(responseTasks.Keys)
Dim response As GetObjectMetadataResponse = Await currentTask
If response.ETag.Trim(""""c) = MD5CalcFile(lvi.SubItems(1).Text) Then
lvi.SubItems(3).Text = "Match"
UpdateLvi(lvi)
End If
Loop
MsgBox((DateTimeOffset.Now - startTime).ToString)
The UI locks up momentarily whenever MDSCalcFile is done. The whole loop takes about 45s and the first file's MD5 result happens within 1s of starting.
If I change the line to:
If response.ETag.Trim(""""c) = Await Task.Run(Function () MD5CalcFile(lvi.SubItems(1).Text)) Then
The UI doesn't lock up when MD5CalcFile is done. The whole loop takes about 75s, up from 45s, and the first file's MD5 result happens after 40s of waiting.