How to truncate a string in elixir?

Question

I'm working with slugs for elixir, the idea is: I have a string with [a-zA-Z0-9] words separated by hyphens. Like:

string = "another-long-string-to-be-truncated-and-much-text-here"

I want to be ensure that max string length equals to 30, but I also want to be sure that words aren't cut by half on reaching maximum length. So that first 30 symbols of string are another-long-string-to-be-trun but I want to have another-long-string-to-be with word truncated to be removed completely. How can I do that?

Yuri Golobokov · Accepted Answer · 2018-12-22 10:01:35Z

First of all, if you don't care about performance at all, you can relay all the work to the regex:

~r/\A(.{0,30})(?:-|\Z)/

I assume it will be the shortest solution, but not efficient:

iex(28)> string
"another-long-string-to-be-truncated-and-much-text-here"
iex(29)> string2
"another-long-string-to-be-cool-about-that"

iex(30)> Regex.run(~r/\A(.{0,30})(?:-|\Z)/, string) |> List.last() 
"another-long-string-to-be"

iex(31)> Regex.run(~r/\A(.{0,30})(?:-|\Z)/, string2) |> List.last()
"another-long-string-to-be-cool"

Efficient solution

But if you do care about performance and memory, then I suggest this:

defmodule CoolSlugHelper do
  def slug(input, length \\ 30) do
    length_minus_1 = length - 1

    case input do
      # if the substring ends with "-"
      # i. e. "abc-def-ghi", 8 or "abc-def-", 8 -> "abc-def"
      <<result::binary-size(length_minus_1), "-", _::binary>> -> result

      # if the next char after the substring is "-"
      # i. e. "abc-def-ghi", 7 or "abc-def-", 7 -> "abc-def"
      <<result::binary-size(length), "-", _::binary>> -> result

      # if it is the exact string. i. e. "abc-def", 7 -> "abc-def"
      <<_::binary-size(length)>> -> input

      # return an empty string if we reached the beginnig of the string
      _ when length <= 1 -> ""

      # otherwise look into shorter substring
      _ -> slug(input, length_minus_1)
    end
  end
end

It does not collect the resulting string char-by-char. Instead, it looks for the correct substring starting from the desired length down to 1. That's how it becomes efficient in terms of memory and speed.

We need this length_minus_1 variable because we cannot use expressions in the binary-size binary pattern matching.

Here is the benchmark of all the proposed solutions as of Dec 22nd, 2018:

(Simple Regex is the ~r/\A(.{0,30})(?:-|\Z)/ regex above)

Name                     ips        average  deviation         median         99th %
CoolSlugHelper      352.14 K        2.84 μs  ±1184.93%           2 μs           8 μs
SlugHelper           70.98 K       14.09 μs   ±170.20%          10 μs          87 μs
Simple Regex         33.14 K       30.17 μs   ±942.90%          21 μs         126 μs
Truncation           11.56 K       86.51 μs    ±84.81%          62 μs         299 μs

Comparison: 
CoolSlugHelper      352.14 K
SlugHelper           70.98 K - 4.96x slower
Simple Regex         33.14 K - 10.63x slower
Truncation           11.56 K - 30.46x slower

Memory usage statistics:

Name              Memory usage
CoolSlugHelper         2.30 KB
SlugHelper            12.94 KB - 5.61x memory usage
Simple Regex          20.16 KB - 8.75x memory usage
Truncation            35.36 KB - 15.34x memory usage

Indeed, I have re-linked “better answer reference” from the most upvoted answer to this one. Also I am surprised and frustrated that binary-size is as faster than char-by-char recursion.

Aleksei Matiushkin · Accepted Answer · 2018-12-22 17:12:15Z

8

UPD 12/2018 Yuri Golobokov posted the better solution here, I’d suggest to use it instead of the below.

The simplest approach would be:

"another-long-string-to-be-truncated-and-much-text-here"
|> String.slice(0..29) 
|> String.replace(~r{-[^-]*$}, "")
#⇒ "another-long-string-to-be"

There is one glitch with it: if the hyphen is exactly on position 31, the last term will be removed. To avoid this, one might explicitly check fot the condition above:

str = "another-long-string-to-be-truncated-and-much-text-here"
case str |> String.at(30) do                                      
  "-" -> str |> String.slice(0..29)                                  
  _   -> str |> String.slice(0..29) |> String.replace(~r{-[^-]*$}, "")
end                                                               
#⇒ "another-long-string-to-be"

or:

orig = "another-long-string-to-be-cool-cated-and-much-text-here"
str = orig |> String.slice(0..29) 
unless String.at(orig, 30) == "-", do: str = str |> String.replace(~r{-[^-]*$}, "")
str
#⇒ "another-long-string-to-be-cool"

edited Dec 22, 2018 at 17:12

answered Sep 8, 2016 at 15:37

Aleksei Matiushkin

121k12 gold badges109 silver badges173 bronze badges

3 Comments

Patrick Oscity Over a year ago

There is a subtle bug here: if the original string contains a single word that is longer than 30 characters, the original word is returned. This results in a string longer than 30 chars, I suppose it should return an empty string instead.

Aleksei Matiushkin Over a year ago

@PatrickOscity it currently returns this word truncated to 30 codepoints. Returning an empty string as a slug sounds like an ...uhm poor design to me.

Vitalii Elenhaupt Over a year ago

@mudasobwa Your code returns "another-long-string-to-be-tru-" for this input "another-long-string-to-be-tru-n". I guess the regular expression has to be the following: [-[^-]*]$.

Mark Reed · Accepted Answer · 2016-09-08 18:19:07Z

2

You could do it recursively..

defmodule Truncation do
  def truncate_words_to(str, max) do
    length = String.length(str)
    words? = Regex.match?(~r{-}, str)
    cond do
      length <= max -> str
      words?        -> truncate_words_to(String.replace(str, ~r{-[^-]*$}, ""),
                                         max)
      true          -> String.slice(str, 0..(max-1))
    end
  end
end

edited Sep 8, 2016 at 18:19

answered Sep 8, 2016 at 15:58

Mark Reed

96k17 gold badges149 silver badges189 bronze badges

5 Comments

Aleksei Matiushkin Over a year ago

cond makes no sense since true condition is perfectly applicable when length <= max. One might use if there.

Mark Reed Over a year ago

It's true that String.slice would work if the string is already shorter than max, but we still need the separate length <= max check to avoid chopping a word off unnecessarily. So we still have three different cases: string is already short enough, string is too long and contains a hyphen, or string is too long and doesn't contain a hyphen.

Aleksei Matiushkin Over a year ago

Ah, yes, got it, sorry. The question now is what if the maxth symbol is a hyphen?

Mark Reed Over a year ago

Good point. Changed the quantifier on the regex so that a final hyphen gets chopped off. (Or a final series of hyphens, albeit at the cost of a separate recursion per hyphen.)

Aleksei Matiushkin Over a year ago

Upvoted, because it’s the most natural way to solve this kind of task, even though it looks a bit overdesigned when we have elixir String helpers.

Alex Antonov · Accepted Answer · 2016-09-09 02:54:13Z

1

My answer is based on @mudasobwa answer, but I decided to simplify it a lot

"another-long-string-to-be-truncated-and-much-text-here"
|> String.slice(0..29)
|> String.split("-")
|> Enum.slice(0..-2)
|> Enum.join("-")

That's it!

edited Sep 9, 2016 at 2:54

answered Sep 8, 2016 at 16:47

Alex Antonov

15.4k10 gold badges75 silver badges157 bronze badges

3 Comments

Aleksei Matiushkin Over a year ago

This is nonsense. It a) returns a list, b) fails on the case when 31st codepoint in the string is a hyphen and c) implies redundant coercion to/from a list.

Alex Antonov Over a year ago

@mudasobwa a) you're correct I forgot about join b) I have no experienced a failure c) but it's easy to read instead

Aleksei Matiushkin Over a year ago

b) you had, but you like were unable to understand it. Try it on "another-long-string-to-be-cool-cated-and-much-text-here". Your code produces "another-long-string-to-be", while the correct result is "another-long-string-to-be-cool". I have described this corner case in the last para of my answer.

Aleksei Matiushkin · Accepted Answer · 2018-12-22 06:21:50Z

1

Since this question still gets hits from search engines, I’d post the proper, fast, elixirish solution to accomplish this task.

defmodule SlugHelper do
  def slug(input, length \\ 30, acc \\ {"", ""})
  def slug("", _, {_, result}), do: result
  def slug(_, 0, {_, result}), do: result
  def slug(<<"-", _::binary>>, 1, {acc, result}), do: result
  def slug(<<"-", rest::binary>>, length, {acc, ""}), do:
    slug(rest, length - 1, {"", acc})
  def slug(<<"-", rest::binary>>, length, {acc, result}), do:
    slug(rest, length - 1, {"", result <> "-" <> acc})
  def slug(<<chr::binary-size(1), rest::binary>>, length, {acc, result}),
    do: slug(rest, length - 1, {acc <> chr, result})
end

string = "another-long-string-to-be-truncated-and-much-text-here"
Enum.each(20..30, & string |> SlugHelper.slug(&1) |> IO.puts())

#⇒ another-long
#  another-long-string
#  another-long-string
#  another-long-string
#  another-long-string-to
#  another-long-string-to
#  another-long-string-to
#  another-long-string-to-be
#  another-long-string-to-be
#  another-long-string-to-be
#  another-long-string-to-be

answered Dec 22, 2018 at 6:21

Aleksei Matiushkin

121k12 gold badges109 silver badges173 bronze badges

1 Comment

Yuri Golobokov Over a year ago

I think there is a problem. I. e. SlugHelper.slug("abc-def", 7) => "abc" and SlugHelper.slug("abc-def", 4) => "". Please, have a look at my solution also.

Collectives™ on Stack Overflow

How to truncate a string in elixir?

5 Answers 5

Efficient solution

1 Comment

3 Comments

5 Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Efficient solution

1 Comment

3 Comments

5 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related