Filtering generic collections in Go: Performance review

Using generics in a real Golang scenario while looking for equivalent functions to Stream filter(Predicate predicate) and Python list comprehension. Instead of relying on existing packages, I chose to write a filter function myself for learning purposes.

func filterStrings(collection []string, test func(string) bool) (f []string) {<!-- -->
 for _, s := range collection {<!-- -->
  if test(s) {<!-- -->
   f = append(f, s)
  }
 }

 return
}

However, this only works with strings. If I need to filter a collection of integers, then I need another function that is very similar. This seems like a perfect choice for a generic function.

func filter[T any](collection []T, test func(T) bool) (f []T) {<!-- -->
for _, e := range collection {<!-- -->
if test(e) {<!-- -->
f = append(f, e)
}
}

return
}

Let’s analyze the (few) differences between typed and generic versions:

Following the function name is the definition of a generic type T.
T is defined as any type.
The type of elements in the input slice changes from string to T
The input and output clice types have also changed from string to T

Without further ado, let’s write some unit tests. First, I need a generator for a random collection (in my case, strings).

func generateStringCollection(size, strLen int) []string {<!-- -->
var collection[]string
for i := 0; i < size; i + + {<!-- -->
collection = append(collection, strings.Repeat(fmt.Sprintf("%c", rune('A' + (i ))), strLen))
}

return collection
}

Now I can write a test case that determines that the output of the filterStrings function is the same as the output of my filter normalizer.

func TestFilter(t *testing.T) {<!-- -->
c := generateStringCollection(1000, 3)

t.Run("the output of the typed and generic functions is the same", func(t *testing.T) {<!-- -->
predicate := func(s string) bool {<!-- --> return s == "AAA" }

filteredStrings := filterStrings(c, predicate)
filteredElements := filter(c, predicate)

if !reflect.DeepEqual(filteredStrings, filteredElements) {<!-- -->
t.Errorf("the output of the two functions is not the same")
}
})
}
=== RUN TestFilter
=== RUN TestFilter/the_output_of_the_typed_and_generic_functions_is_the_same
--- PASS: TestFilter (0.00s)
    --- PASS: TestFilter/the_output_of_the_typed_and_generic_functions_is_the_same (0.00s)
PASS

Consider the performance issues of the new function when handling large slices. How can I make sure it performs well in this case as well?

The answer is: benchmarking. Writing benchmark tests in Go is very similar to unit tests.

const (
CollectionSize = 1000
ElementSize=3
)

func BenchmarkFilter_Typed_Copying(b *testing.B) {<!-- -->
c := generateStringCollection(CollectionSize, ElementSize)

b.Run("Equals to AAA", func(b *testing.B) {<!-- -->
for i := 0; i < b.N; i + + {<!-- -->
filterStrings(c, func(s string) bool {<!-- --> return s == "AAA" })
}
})
}

func BenchmarkFilter_Generics_Copying(b *testing.B) {<!-- -->
c := generateStringCollection(CollectionSize, ElementSize)

b.Run("Equals to AAA", func(b *testing.B) {<!-- -->
for i := 0; i < b.N; i + + {<!-- -->
filter(c, func(s string) bool {<!-- --> return s == "AAA" })
}
})
}
go test -bench=. -count=10 -benchmem
goos: darwin
goarch: arm64
pkg: github.com/timliudream/go-test/generic_test
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 718408 1641 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 718148 1640 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 732939 1655 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 723036 1639 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 699075 1639 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 707232 1643 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 616422 1652 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 730702 1649 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 691488 1700 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 717043 1646 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 428851 2754 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 428437 2762 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 430444 2800 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 429314 2757 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 430938 2754 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 429795 2754 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 426714 2755 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 418152 2755 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 431739 2761 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 412221 2755 ns/op 4080 B/op 8 allocs/op
PASS
ok github.com/timliudream/go-test/generic_test 25.005s

I’m not happy with the result. It looks like I’m trading readability for performance.
Also, I’m a little worried about the amount allocated. Did you notice the _Copying suffix in my test name? That’s because both of my filter functions copy the filtered items from the input collection into the output collection. Why do I have to occupy memory for such a simple task?
At the end of the day, what I need to do is filter the raw collection. I decided to solve this problem first.

func filterInPlace[T any](collection []T, test func(T) bool) []T {<!-- -->
var position, size = 0, len(collection)
for i := 0; i < size; i + + {<!-- -->
if test(collection[i]) {<!-- -->
collection[position] = collection[i]
position++
}
}

return collection[:position]
}

Instead of writing the filtered results to a new collection and then back, I write the results back to the original collection and keep an extra index to “slice” out one on the filtered number of items Fragments.

My unit tests still pass, after changing the following line.

filteredStrings := filterStrings(c, predicate)
//filteredElements := filter(c, predicate)
  
filteredElements := filterInPlace(c, predicate) // new memory-savvy function

Add another bench method:

func BenchmarkFilter_Generics_InPlace(b *testing.B) {<!-- -->
 c := generateStringCollection(CollectionSize, 3)

 b.Run("Equals to AAA", func(b *testing.B) {<!-- -->
  for i := 0; i < b.N; i + + {<!-- -->
   filterInPlace(c, func(s string) bool {<!-- --> return s == "AAA" })
  }
 })
}

The results are outstanding.

go test -bench=. -benchmem
goos: darwin
goarch: arm64
pkg: github.com/timliudream/go-test/generic_test
BenchmarkFilter_Typed_Copying/Equals_to_AAA-8 713928 1642 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Copying/Equals_to_AAA-8 426055 2787 ns/op 4080 B/op 8 allocs/op
BenchmarkFilter_Generics_Inplace/Equals_to_AAA-8 483994 2467 ns/op 0 B/op 0 allocs/op
PASS
ok github.com/timliudream/go-test/generic_test 4.925s

Not only are memory allocations zeroed out, but performance is also significantly improved.