John Potocny
2015-03-21 18:45:20 UTC
Hi everyone,
I spent some time this week refactoring a piece of code to extract part of
a string from a larger block, and in the process I discovered that the
strings.Trim() function was actually fairly slow, and was allocating memory
when I called it. This was fairly surprising to me, since I had always
figured the implementation of Trim would be pretty efficient.
I took a look at the implementation to figure this out - here it is below
as a reference:
func makeCutsetFunc(cutset string) func(rune) bool {
return func(r rune) bool { return IndexRune(cutset, r) >= 0 }
}
// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, makeCutsetFunc(cutset))
}
Now it makes perfect sense to me - the plain strings.Trim() functions work
by calling their corresponding strings.TrimFunc() implementation, with the
provided cutset argument used in a closure that is used as f(c). This is of
course where the allocation comes from, but it seems pretty inefficient to
me, and indeed a simple benchmark shows that strings.TrimFunc() can be used
in place of strings.Trim() for a fairly substantial performance boost (if
you know the cutset).
I played around with the implementation of strings.Trim() and was able to
mock up a version that avoids allocation fairly easily - it basically
substitutes the current implementation of Trim with that of
strings.TrimFunc(), using a hardcoded f(c). This duplicates a bit of code
of course, but it also provides a decent speedup (about 20%, based on some
simple benchmarks I did).
I'm curious whether anyone would find this useful - do people have any use
for a faster Trim() implementation, or is everyone who needs performance
already just using TrimFunc()? If people think that the performance boost
might be worth the code duplication involved, I'm happy to polish my
implementation and submit a CL. Let me know what you guys think!
I spent some time this week refactoring a piece of code to extract part of
a string from a larger block, and in the process I discovered that the
strings.Trim() function was actually fairly slow, and was allocating memory
when I called it. This was fairly surprising to me, since I had always
figured the implementation of Trim would be pretty efficient.
I took a look at the implementation to figure this out - here it is below
as a reference:
func makeCutsetFunc(cutset string) func(rune) bool {
return func(r rune) bool { return IndexRune(cutset, r) >= 0 }
}
// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, makeCutsetFunc(cutset))
}
Now it makes perfect sense to me - the plain strings.Trim() functions work
by calling their corresponding strings.TrimFunc() implementation, with the
provided cutset argument used in a closure that is used as f(c). This is of
course where the allocation comes from, but it seems pretty inefficient to
me, and indeed a simple benchmark shows that strings.TrimFunc() can be used
in place of strings.Trim() for a fairly substantial performance boost (if
you know the cutset).
I played around with the implementation of strings.Trim() and was able to
mock up a version that avoids allocation fairly easily - it basically
substitutes the current implementation of Trim with that of
strings.TrimFunc(), using a hardcoded f(c). This duplicates a bit of code
of course, but it also provides a decent speedup (about 20%, based on some
simple benchmarks I did).
I'm curious whether anyone would find this useful - do people have any use
for a faster Trim() implementation, or is everyone who needs performance
already just using TrimFunc()? If people think that the performance boost
might be worth the code duplication involved, I'm happy to polish my
implementation and submit a CL. Let me know what you guys think!
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.