1

I have an issue where extracting data from database it sometimes (quite often) adds spaces in between strings of texts that should not be there.

What I'm trying to do is create a small script that will look at these strings and remove the spaces.

The problem is that the spaces can be in any position in the string, and the string is a variable that changes.

Example:

"StaffID": "0000 25" <- The space in the number should not be there. Is there a way to have the script look at this particular line, and if it finds spaces, to remove them.
Or:"DateOfBirth": "23-10-199 0" <-It would also need to look at these spaces and remove them.

The problem is that the same data also has lines such as:
"Address": " 91 Broad street" <- The spaces should be here obviously.

I've tried using TRIM, but that only removes spaces from start/end.

Worth mentioning that the data extracted is in json format and is then imported using API into the new system.

4
  • for the props that you KNOW you want to remove all spaces, you can use -replace '\s{1,}' to replace 1-or-more spaces with nothing. Commented Jun 7, 2019 at 15:46
  • Possible duplicate of Removing more than one white space in powershell Commented Jun 7, 2019 at 17:29
  • 2
    This looks like JSON. A good question, I think, to ask is - where is this 'bad' data coming from? Is it in the original database like this 23-10-199 0? If so, you've got data issues that are VERY complicated to overcome. This may be a "fix your data" answer first. Otherwise, is the extract process causing the problem? Commented Jun 7, 2019 at 17:29
  • It is a JSON. I've gone over the database and its recorder properly in the database i.e. no spaces. Obviously something to do with how its extracted and then formatted, but I cant seem to find the issue, so thought it might be easier to do a script to remove the spaces. Looks like I was wrong.. Commented Jun 8, 2019 at 9:54

2 Answers 2

2

You should think about the logic of what you want to do, and whether or not it's programmatically possible to determine if you can teach your script where it is or is not appropriate to put spaces. As it is, this is one of the biggest problems facing AI research right now, so unfortunately you're probably going to have to do this by hand.

If it were me, I'd specify the kind of data format that I expect from each column, and try my best to attempt to parse those strings. For example, if you know that StaffID doesn't contain spaces, you can have a rule that just deletes them:

$staffid = $staffid.replace("\s+",'')

There are some more complicated things that you can do with forced formatting (.replace) that have already been covered in this answer, but again, that requires some expectation of exactly what data is going to come out of what column.

You might want to look more closely at where those spaces are coming from, rather than process the output like this. Is the retrieval script doing it? Maybe you can optimize the database that you're drawing from?

Sign up to request clarification or add additional context in comments.

Comments

1

Nowadays, you use:

# "some spaces here" -replace (' ')
somespaceshere

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.