0

I scrape sites for a database with a chrome extension, need assitance with a JavaScript Clean up function

e.g

https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p

my target output is:

_60789694386.html

everything past .html needs to be removed, but since it is diffrent in each URL - i'm lost

the output is in a .csv file, in which i run a JavaScript to clean up the data.

   this.values[8] = this.values[8].replace("https://www.alibaba.com/product-detail/","");

this.values[8] is how i target the column in the script. (Column 8 holds the URL)

2

7 Answers 7

3

Well, you can use split.

var final = this.values[8].split('.html')[0]

split gives you an array of items split by a string, in your case'.html', then you take the first one.

Sign up to request clarification or add additional context in comments.

Comments

1

Consider using substr

this.values[8] = this.values[8].substr(0,this.values[8].indexOf('?'))

Comments

0

You can use split method to divide text from ? as in example.

var link = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
var result = link.split('?')[0].replace("https://www.alibaba.com/product-detail/","");
console.log(result);

Comments

0

Not sure i understood your problem, but try this

var s = 'https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p'
s = s.substring(0, s.indexOf('?'));
console.log( s );

Comments

0

For when you don't care about readability...

this.values[8] = new URL(this.values[8]).pathname.split("/").pop().replace(".html","");

Comments

0

Alternate, without using split

var link = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
var result = link.replace('https://www.alibaba.com/product-detail/', '').replace(/\?.*$/, '');
console.log(result);

Comments

0

You can use the regex to get it done. As of my knowledge you do something like:

    var v = "https://www.alibaba.com/product-detail/_60789694386.html?spm=a2700.galleryofferlist.normalList.1.5be41470uWBNGm&s=p"
    result = (v.match(/[^\/]+$/)[0]);
    result = result.substring(0,result.indexOf('?'));
    console.log(result);    // will return _60789694386.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.