0

I have the following two examples of html-

<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word"></a> blue elephant  &middot;

<a href="http://foo.com">User</a>: <a style="color:#333" href="http://foo.com/word">@<b>word</b></a> blue elephant  &middot;

I am trying to parse this using C# to put into a csv file and it is working to an extent however, when the html contains the '@' symbol in it, it will either leave the csv cell blank or not include the word with '@' before it. The main part I am trying to get is @word blue elephant however this is bringing back a blank cell, whereas the first html example brings back blue elephant as desired.

I am using the following technique to do this-

string[] comm = System.Text.RegularExpressions.Regex.Split(content[1], "<a");

How can I alter this to work for the second html example?

1

1 Answer 1

6

You want to use a proper HTML parser like the one in HTML agility pack in this situation (and save yourself from invoking the wrath of Cthulhu)

Some examples of how to use it

Sign up to request clarification or add additional context in comments.

3 Comments

Ok thanks for the input, I presume my question would not be overly complex when using a tool like this?
No, it's pretty easy to use and understand, if your familiar with the structure of HTML documents. If you're not, you soon will be :)
I have mark your answer as useful, however will give full credit once I get my head around the agility pack thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.