Discussion:
[go-nuts] Performance of strings.Trim() vs strings.TrimFunc()
John Potocny
2015-03-21 18:45:20 UTC
Permalink
Hi everyone,

I spent some time this week refactoring a piece of code to extract part of
a string from a larger block, and in the process I discovered that the
strings.Trim() function was actually fairly slow, and was allocating memory
when I called it. This was fairly surprising to me, since I had always
figured the implementation of Trim would be pretty efficient.

I took a look at the implementation to figure this out - here it is below
as a reference:


func makeCutsetFunc(cutset string) func(rune) bool {

return func(r rune) bool { return IndexRune(cutset, r) >= 0 }

}



// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, makeCutsetFunc(cutset))
}



Now it makes perfect sense to me - the plain strings.Trim() functions work
by calling their corresponding strings.TrimFunc() implementation, with the
provided cutset argument used in a closure that is used as f(c). This is of
course where the allocation comes from, but it seems pretty inefficient to
me, and indeed a simple benchmark shows that strings.TrimFunc() can be used
in place of strings.Trim() for a fairly substantial performance boost (if
you know the cutset).

I played around with the implementation of strings.Trim() and was able to
mock up a version that avoids allocation fairly easily - it basically
substitutes the current implementation of Trim with that of
strings.TrimFunc(), using a hardcoded f(c). This duplicates a bit of code
of course, but it also provides a decent speedup (about 20%, based on some
simple benchmarks I did).

I'm curious whether anyone would find this useful - do people have any use
for a faster Trim() implementation, or is everyone who needs performance
already just using TrimFunc()? If people think that the performance boost
might be worth the code duplication involved, I'm happy to polish my
implementation and submit a CL. Let me know what you guys think!
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
David DENG
2015-03-21 21:47:25 UTC
Permalink
I guess(not testing just from common sense) the following code could avoid
allocation(on heap):

func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, func(r rune) bool { return IndexRune(cutset, r) >=
0 })
}

It only expands makeCutsetFunc into Trim. I think the compiler can analyze
and know the second parameter of TrimFun doesn't escape and makes the
closure allocated on stack.
Post by John Potocny
Hi everyone,
I spent some time this week refactoring a piece of code to extract part of
a string from a larger block, and in the process I discovered that the
strings.Trim() function was actually fairly slow, and was allocating memory
when I called it. This was fairly surprising to me, since I had always
figured the implementation of Trim would be pretty efficient.
I took a look at the implementation to figure this out - here it is below
func makeCutsetFunc(cutset string) func(rune) bool {
return func(r rune) bool { return IndexRune(cutset, r) >= 0 }
}
// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, makeCutsetFunc(cutset))
}
Now it makes perfect sense to me - the plain strings.Trim() functions
work by calling their corresponding strings.TrimFunc() implementation, with
the provided cutset argument used in a closure that is used as f(c). This
is of course where the allocation comes from, but it seems pretty
inefficient to me, and indeed a simple benchmark shows that
strings.TrimFunc() can be used in place of strings.Trim() for a fairly
substantial performance boost (if you know the cutset).
I played around with the implementation of strings.Trim() and was able to
mock up a version that avoids allocation fairly easily - it basically
substitutes the current implementation of Trim with that of
strings.TrimFunc(), using a hardcoded f(c). This duplicates a bit of code
of course, but it also provides a decent speedup (about 20%, based on some
simple benchmarks I did).
I'm curious whether anyone would find this useful - do people have any use
for a faster Trim() implementation, or is everyone who needs performance
already just using TrimFunc()? If people think that the performance boost
might be worth the code duplication involved, I'm happy to polish my
implementation and submit a CL. Let me know what you guys think!
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
John Potocny
2015-03-21 21:56:17 UTC
Permalink
Tested it, totally works! My benchmarks show that the version I wrote is a
little faster, although that is probably because my version has a smaller
call stack :)
Post by David DENG
I guess(not testing just from common sense) the following code could avoid
func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, func(r rune) bool { return IndexRune(cutset, r) >=
0 })
}
It only expands makeCutsetFunc into Trim. I think the compiler can
analyze and know the second parameter of TrimFun doesn't escape and makes
the closure allocated on stack.
Post by John Potocny
Hi everyone,
I spent some time this week refactoring a piece of code to extract part
of a string from a larger block, and in the process I discovered that the
strings.Trim() function was actually fairly slow, and was allocating memory
when I called it. This was fairly surprising to me, since I had always
figured the implementation of Trim would be pretty efficient.
I took a look at the implementation to figure this out - here it is below
func makeCutsetFunc(cutset string) func(rune) bool {
return func(r rune) bool { return IndexRune(cutset, r) >= 0 }
}
// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, makeCutsetFunc(cutset))
}
Now it makes perfect sense to me - the plain strings.Trim() functions
work by calling their corresponding strings.TrimFunc() implementation, with
the provided cutset argument used in a closure that is used as f(c). This
is of course where the allocation comes from, but it seems pretty
inefficient to me, and indeed a simple benchmark shows that
strings.TrimFunc() can be used in place of strings.Trim() for a fairly
substantial performance boost (if you know the cutset).
I played around with the implementation of strings.Trim() and was able to
mock up a version that avoids allocation fairly easily - it basically
substitutes the current implementation of Trim with that of
strings.TrimFunc(), using a hardcoded f(c). This duplicates a bit of code
of course, but it also provides a decent speedup (about 20%, based on some
simple benchmarks I did).
I'm curious whether anyone would find this useful - do people have any
use for a faster Trim() implementation, or is everyone who needs
performance already just using TrimFunc()? If people think that the
performance boost might be worth the code duplication involved, I'm happy
to polish my implementation and submit a CL. Let me know what you guys
think!
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Brad Fitzpatrick
2015-03-22 00:51:02 UTC
Permalink
Send in a change + additional benchmark!
Post by John Potocny
Hi everyone,
I spent some time this week refactoring a piece of code to extract part of
a string from a larger block, and in the process I discovered that the
strings.Trim() function was actually fairly slow, and was allocating memory
when I called it. This was fairly surprising to me, since I had always
figured the implementation of Trim would be pretty efficient.
I took a look at the implementation to figure this out - here it is below
func makeCutsetFunc(cutset string) func(rune) bool {
return func(r rune) bool { return IndexRune(cutset, r) >= 0 }
}
// Trim returns a slice of the string s with all leading and
// trailing Unicode code points contained in cutset removed.
func Trim(s string, cutset string) string {
if s == "" || cutset == "" {
return s
}
return TrimFunc(s, makeCutsetFunc(cutset))
}
Now it makes perfect sense to me - the plain strings.Trim() functions
work by calling their corresponding strings.TrimFunc() implementation, with
the provided cutset argument used in a closure that is used as f(c). This
is of course where the allocation comes from, but it seems pretty
inefficient to me, and indeed a simple benchmark shows that
strings.TrimFunc() can be used in place of strings.Trim() for a fairly
substantial performance boost (if you know the cutset).
I played around with the implementation of strings.Trim() and was able to
mock up a version that avoids allocation fairly easily - it basically
substitutes the current implementation of Trim with that of
strings.TrimFunc(), using a hardcoded f(c). This duplicates a bit of code
of course, but it also provides a decent speedup (about 20%, based on some
simple benchmarks I did).
I'm curious whether anyone would find this useful - do people have any use
for a faster Trim() implementation, or is everyone who needs performance
already just using TrimFunc()? If people think that the performance boost
might be worth the code duplication involved, I'm happy to polish my
implementation and submit a CL. Let me know what you guys think!
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...