0

I have these strings:

'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
'da_report/GY4LFDN6/2017_11/activily_time2017_11/index.html'

From these two strings, I want to extract these two file names:

'2017_11/view_mission_join_player_count2017_11'
'2017_11/activily_time2017_11'

I wrote some regular expressions, but they seem wrong.

str = 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
str[/([^\/index.html]+)/, 1] # => "a_r"
4
  • Does the string always start with da_report/GY4LFDN6/? Commented Dec 25, 2017 at 7:43
  • 1
    What are the rules? Commented Dec 25, 2017 at 7:44
  • yes, it always start with da_report/GY4LFDN6/ Commented Dec 25, 2017 at 7:50
  • @CodaChang Even so, it doesn't mean that it would be good practice to hard code these values, unless you only want to target these types of paths in particular. Commented Dec 25, 2017 at 7:53

5 Answers 5

1

Regular expression is an overkill here, and i prone to errors.

input = [
  "da_report/GY4LFDN6/" \
  "2017_11/view_mission_join_player_count2017_11" \
  "/index.html",
  "da_report/GY4LFDN6/" \
  "2017_11/activily_time2017_11" \
  "/index.html"
]  

input.map { |str| str.split('/')[2..3].join('/') }
#⇒ [
#   [0] "2017_11/view_mission_join_player_count2017_11",
#   [1] "2017_11/activily_time2017_11"
# ]

or, more elegant:

input.map { |str| str.split('/').grep(/2017_/).join('/') }
Sign up to request clarification or add additional context in comments.

5 Comments

Oh, I know this way, and it's a great answer. But I just want to try the regex way.
str[%r|(?<=\Ada_report/GY4LFDN6/)\w+/\w+|] (“I just want to use a regexp“ sounds a bit silly to me; regexp is not a way to go here.)
Thanks, I will change my way to achieve that.
@mudasobwa I disagree with your notion of using regex here. If the OP wanted to filter paths using names, then a string join approach would not be workable. Regex is not prone to error if the person using it knows regex.
@TimBiegeleisen if that was a different task, the different ways to solve it would probably be better. Also, I never said “regexp is prone to errors,” I said “here it’s prone to error.” Also, it’s noticeably slower.
0

Use /(?<=GY4LFDN6\/)(.*)(?=\/index.html)/

str = 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
str[/(?<=GY4LFDN6\/)(.*)(?=\/index.html)/]
 => "2017_11/view_mission_join_player_count2017_11"

live demo: http://rubular.com/r/Ued6UOXWDf

Comments

0

This answer assumes that you want to capture beginning with the third component of the path, up to and including the last component of the path before the filename. If so, then we can use the following regex pattern:

(?:[^/]*/){2}(.*)/.*

The quantity in parentheses is the capture group, i.e. what you want to extract from the entire path.

str = 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
puts str[/(?:[^\/]*\/){2}(.*)\/.*/, 1]

Demo

Comments

0

If you are looking for the values at the end of the string like in the format string/string followed by /filename.extension, you could use a positive lookahead for a file name.

\w+\/\w+(?=\/\w+\.\w+$)

Demo

Comments

0

Based on your examples, you may be able to use a very simple regex.

def extract(str)
  str[/\d{4}_\d{2}.+\d{4}_\d{2}/]
end

extract 'da_report/GY4LFDN6/2017_11/view_mission_join_player_count2017_11/index.html'
  #=> "2017_11/view_mission_join_player_count2017_11"
extract 'da_report/GY4LFDN6/2017_11/activily_time2017_11/index.html'
  #=> "2017_11/activily_time2017_11"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.