0

my string may be like this:

@ *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?

in fact - it is a dirty csv string - having names of jpg images

I need to remove any non-alphanum chars - from both sides of the string
then - inside the resulting string - remove the same - except commas and dots
then - remove duplicates commas and dots - if any - replace them with single ones

so the final result should be:
lorem.jpg,ipsum.jpg,dolor.jpg

I firstly tried to remove any white space - anywhere

$str = str_replace(" ", "", $str);  

then I used various forms of trim functions - but it is tedious and a lot of code

the additional problem is - duplicates commas and dots may have one or more instances - for example - .. or ,,,,

is there a way to solve this using regex, pls ?

5
  • Is this helpful : stackoverflow.com/questions/659025/… Commented Jan 17, 2023 at 10:16
  • Once you removed the spaces, the regular expression (\w+\.\w+) should be enough to extract all the file names using preg_match_all. You can then use implode to join those results with a comma between them. Commented Jan 17, 2023 at 10:16
  • @CBroe - interesting, thanks, I will try. But I suppose duplicates commas and dots are still the problem Commented Jan 17, 2023 at 10:20
  • Can you try this $result = preg_replace("/[^A-Za-z0-9,.]/", '', $str); Commented Jan 17, 2023 at 10:22
  • @SelVazi - it works except last comma - but I can remove it by rtrim. But it does not remove duplicates commas and dots Commented Jan 17, 2023 at 10:28

3 Answers 3

2

List of modeled steps following your words:

Step 1

  • "remove any non-alphanum chars from both sides of the string"

  • translated: remove trailing and tailing consecutive [^a-zA-Z0-9] characters

  • regex: replace ^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$ with $1

Step 2

  • "inside the resulting string - remove the same - except commas and dots"
  • translated: remove any [^a-zA-Z0-9.,]
  • regex: replace [^a-zA-Z0-9.,] with empty string

Step 3

  • "remove duplicates commas and dots - if any - replace them with single ones"
  • translated: replace consecutive [,.] as a single instance
  • regex: replace (\.{2,}) with .
  • regex: replace (,{2,}) with ,

PHP Demo:

https://onlinephp.io/c/512e1

<?php

$subject = " @ *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?";

$firstStep = preg_replace('/^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$/', '$1', $subject);
$secondStep = preg_replace('/[^a-z,A-Z0-9.,]/', '', $firstStep);
$thirdStepA = preg_replace('(\.{2,})', '.', $secondStep);
$thirdStepB = preg_replace('(,{2,})', ',', $thirdStepA);

echo $thirdStepB; //lorem.jpg,ipsum.jpg,dolor.jpg
Sign up to request clarification or add additional context in comments.

2 Comments

thanks a lot, especially for your explanations
I like to take care of those details. It also helps me improving. Glad it helped and thanks for pointing out the "decoration" aspect
1

Look at

https://www.php.net/manual/en/function.preg-replace.php

It replace anything inside a string based on pattern. \s represent all space char, but care of NBSP (non breakable space, \h match it )

Exemple 4

$str = preg_replace('/\s\s+/', '', $str);

It will be something like that

Comments

1

Can you try this :

$string = ' @ *lorem.jpg,,,,  ip sum.jpg,dolor .jpg,-/ ?';
// this will left only alphanumirics
$result = preg_replace("/[^A-Za-z0-9,.]/", '', $string);

// this will remove duplicated dot and ,
$result = preg_replace('/,+/', ',', $result);
$result = preg_replace('/\.+/', '.', $result);

// this will remove ,;. and space from the end
$result = preg_replace("/[ ,;.]*$/", '', $result);

2 Comments

tried, works - except duplicates commas and dots
I made a little update to remove duplicated commas and dots can you try it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.