15

I'm working with slugs for elixir, the idea is: I have a string with [a-zA-Z0-9] words separated by hyphens. Like:

string = "another-long-string-to-be-truncated-and-much-text-here"

I want to be ensure that max string length equals to 30, but I also want to be sure that words aren't cut by half on reaching maximum length. So that first 30 symbols of string are another-long-string-to-be-trun but I want to have another-long-string-to-be with word truncated to be removed completely. How can I do that?

5 Answers 5

10

First of all, if you don't care about performance at all, you can relay all the work to the regex:

~r/\A(.{0,30})(?:-|\Z)/

I assume it will be the shortest solution, but not efficient:

iex(28)> string
"another-long-string-to-be-truncated-and-much-text-here"
iex(29)> string2
"another-long-string-to-be-cool-about-that"

iex(30)> Regex.run(~r/\A(.{0,30})(?:-|\Z)/, string) |> List.last() 
"another-long-string-to-be"

iex(31)> Regex.run(~r/\A(.{0,30})(?:-|\Z)/, string2) |> List.last()
"another-long-string-to-be-cool"

Efficient solution

But if you do care about performance and memory, then I suggest this:

defmodule CoolSlugHelper do
  def slug(input, length \\ 30) do
    length_minus_1 = length - 1

    case input do
      # if the substring ends with "-"
      # i. e. "abc-def-ghi", 8 or "abc-def-", 8 -> "abc-def"
      <<result::binary-size(length_minus_1), "-", _::binary>> -> result

      # if the next char after the substring is "-"
      # i. e. "abc-def-ghi", 7 or "abc-def-", 7 -> "abc-def"
      <<result::binary-size(length), "-", _::binary>> -> result

      # if it is the exact string. i. e. "abc-def", 7 -> "abc-def"
      <<_::binary-size(length)>> -> input

      # return an empty string if we reached the beginnig of the string
      _ when length <= 1 -> ""

      # otherwise look into shorter substring
      _ -> slug(input, length_minus_1)
    end
  end
end

It does not collect the resulting string char-by-char. Instead, it looks for the correct substring starting from the desired length down to 1. That's how it becomes efficient in terms of memory and speed.

We need this length_minus_1 variable because we cannot use expressions in the binary-size binary pattern matching.

Here is the benchmark of all the proposed solutions as of Dec 22nd, 2018:

(Simple Regex is the ~r/\A(.{0,30})(?:-|\Z)/ regex above)

Name                     ips        average  deviation         median         99th %
CoolSlugHelper      352.14 K        2.84 μs  ±1184.93%           2 μs           8 μs
SlugHelper           70.98 K       14.09 μs   ±170.20%          10 μs          87 μs
Simple Regex         33.14 K       30.17 μs   ±942.90%          21 μs         126 μs
Truncation           11.56 K       86.51 μs    ±84.81%          62 μs         299 μs

Comparison: 
CoolSlugHelper      352.14 K
SlugHelper           70.98 K - 4.96x slower
Simple Regex         33.14 K - 10.63x slower
Truncation           11.56 K - 30.46x slower

Memory usage statistics:

Name              Memory usage
CoolSlugHelper         2.30 KB
SlugHelper            12.94 KB - 5.61x memory usage
Simple Regex          20.16 KB - 8.75x memory usage
Truncation            35.36 KB - 15.34x memory usage
Sign up to request clarification or add additional context in comments.

1 Comment

Indeed, I have re-linked “better answer reference” from the most upvoted answer to this one. Also I am surprised and frustrated that binary-size is as faster than char-by-char recursion.
8

UPD 12/2018 Yuri Golobokov posted the better solution here, I’d suggest to use it instead of the below.


The simplest approach would be:

"another-long-string-to-be-truncated-and-much-text-here"
|> String.slice(0..29) 
|> String.replace(~r{-[^-]*$}, "")
#⇒ "another-long-string-to-be"

There is one glitch with it: if the hyphen is exactly on position 31, the last term will be removed. To avoid this, one might explicitly check fot the condition above:

str = "another-long-string-to-be-truncated-and-much-text-here"
case str |> String.at(30) do                                      
  "-" -> str |> String.slice(0..29)                                  
  _   -> str |> String.slice(0..29) |> String.replace(~r{-[^-]*$}, "")
end                                                               
#⇒ "another-long-string-to-be"

or:

orig = "another-long-string-to-be-cool-cated-and-much-text-here"
str = orig |> String.slice(0..29) 
unless String.at(orig, 30) == "-", do: str = str |> String.replace(~r{-[^-]*$}, "")
str
#⇒ "another-long-string-to-be-cool"

3 Comments

There is a subtle bug here: if the original string contains a single word that is longer than 30 characters, the original word is returned. This results in a string longer than 30 chars, I suppose it should return an empty string instead.
@PatrickOscity it currently returns this word truncated to 30 codepoints. Returning an empty string as a slug sounds like an ...uhm poor design to me.
@mudasobwa Your code returns "another-long-string-to-be-tru-" for this input "another-long-string-to-be-tru-n". I guess the regular expression has to be the following: [-[^-]*]$.
2

You could do it recursively..

defmodule Truncation do
  def truncate_words_to(str, max) do
    length = String.length(str)
    words? = Regex.match?(~r{-}, str)
    cond do
      length <= max -> str
      words?        -> truncate_words_to(String.replace(str, ~r{-[^-]*$}, ""),
                                         max)
      true          -> String.slice(str, 0..(max-1))
    end
  end
end

5 Comments

cond makes no sense since true condition is perfectly applicable when length <= max. One might use if there.
It's true that String.slice would work if the string is already shorter than max, but we still need the separate length <= max check to avoid chopping a word off unnecessarily. So we still have three different cases: string is already short enough, string is too long and contains a hyphen, or string is too long and doesn't contain a hyphen.
Ah, yes, got it, sorry. The question now is what if the maxth symbol is a hyphen?
Good point. Changed the quantifier on the regex so that a final hyphen gets chopped off. (Or a final series of hyphens, albeit at the cost of a separate recursion per hyphen.)
Upvoted, because it’s the most natural way to solve this kind of task, even though it looks a bit overdesigned when we have elixir String helpers.
1

My answer is based on @mudasobwa answer, but I decided to simplify it a lot

"another-long-string-to-be-truncated-and-much-text-here"
|> String.slice(0..29)
|> String.split("-")
|> Enum.slice(0..-2)
|> Enum.join("-")

That's it!

3 Comments

This is nonsense. It a) returns a list, b) fails on the case when 31st codepoint in the string is a hyphen and c) implies redundant coercion to/from a list.
@mudasobwa a) you're correct I forgot about join b) I have no experienced a failure c) but it's easy to read instead
b) you had, but you like were unable to understand it. Try it on "another-long-string-to-be-cool-cated-and-much-text-here". Your code produces "another-long-string-to-be", while the correct result is "another-long-string-to-be-cool". I have described this corner case in the last para of my answer.
1

Since this question still gets hits from search engines, I’d post the proper, fast, elixirish solution to accomplish this task.

defmodule SlugHelper do
  def slug(input, length \\ 30, acc \\ {"", ""})
  def slug("", _, {_, result}), do: result
  def slug(_, 0, {_, result}), do: result
  def slug(<<"-", _::binary>>, 1, {acc, result}), do: result
  def slug(<<"-", rest::binary>>, length, {acc, ""}), do:
    slug(rest, length - 1, {"", acc})
  def slug(<<"-", rest::binary>>, length, {acc, result}), do:
    slug(rest, length - 1, {"", result <> "-" <> acc})
  def slug(<<chr::binary-size(1), rest::binary>>, length, {acc, result}),
    do: slug(rest, length - 1, {acc <> chr, result})
end

string = "another-long-string-to-be-truncated-and-much-text-here"
Enum.each(20..30, & string |> SlugHelper.slug(&1) |> IO.puts())

#⇒ another-long
#  another-long-string
#  another-long-string
#  another-long-string
#  another-long-string-to
#  another-long-string-to
#  another-long-string-to
#  another-long-string-to-be
#  another-long-string-to-be
#  another-long-string-to-be
#  another-long-string-to-be

1 Comment

I think there is a problem. I. e. SlugHelper.slug("abc-def", 7) => "abc" and SlugHelper.slug("abc-def", 4) => "". Please, have a look at my solution also.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.