I'm going through the go tour and I feel like I have a pretty good understanding of the language except for concurrency.
On slide 71 there is an exercise that asks the reader to parallelize a web crawler (and to make it not cover repeats but I haven't gotten there yet.)
Here is what I have so far:
func Crawl(url string, depth int, fetcher Fetcher, ch chan string) {
if depth <= 0 {
return
}
body, urls, err := fetcher.Fetch(url)
if err != nil {
ch <- fmt.Sprintln(err)
return
}
ch <- fmt.Sprintf("found: %s %q\n", url, body)
for _, u := range urls {
go Crawl(u, depth-1, fetcher, ch)
}
}
func main() {
ch := make(chan string, 100)
go Crawl("http://golang.org/", 4, fetcher, ch)
for i := range ch {
fmt.Println(i)
}
}
The issue I have is where to put the close(ch) call. If I put a defer close(ch) somewhere in the Crawl method, then I end up writing to a closed channel in one of the spawned goroutines, since the method will finish execution before the spawned goroutines do.
If I omit the call to close(ch), as is shown in my example code, the program deadlocks after all the goroutines finish executing but the main thread is still waiting on the channel in the for loop since the channel was never closed.