1

Anyone know how I would find & replace text in a string? Basically I have two strings:

string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDABQODxIPDRQSERIXFhQYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f3//";

string secondS = "abcdefg2wBDABQODxIPDRQSERIXFh/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/abcdefg";

I want to search firstS to see if it contains any sequence of characters that's in secondS and then replace it. It also needs to be replaced with the number of replaced characters in squared brackets:

[NUMBER-OF-CHARACTERS-REPLACED]

For example, because firstS and secondS both contain "2wBDABQODxIPDRQSERIXFh" and "/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/f39/" they would need to be replaced. So then firstS becomes:

string firstS = "/9j/4AAQSkZJRgABAQEAYABgAAD/[22]QYHzMhHxwcHz8tLyUzSkFOTUlBSEZSXHZkUldvWEZIZoxob3p9hIWET2ORm4+AmnaBhH//2wBDARYXFx8bHzwhITx/VEhUf39[61]f3//";

Hope that makes sense. I think I could do this with Regex, but I don't like the inefficiency of it. Does anyone know of another, faster way?

1

4 Answers 4

3

Does anyone know of another, faster way?

Yes, this problem actually has a proper name. It is called the Longest Common Substring, and it has a reasonably fast solution.

Here is an implementation on ideone. It finds and replaces all common substrings of ten characters or longer.

// This comes straight from Wikipedia article linked above:
private static string FindLcs(string s, string t) {
    var L = new int[s.Length, t.Length];
    var z = 0;
    var ret = new StringBuilder();
    for (var i = 0 ; i != s.Length ; i++) {
        for (var j = 0 ; j != t.Length ; j++) {
            if (s[i] == t[j]) {
                if (i == 0 || j == 0) {
                    L[i,j] = 1;
                } else {
                    L[i,j] = L[i-1,j-1] + 1;
                }
                if (L[i,j] > z) {
                    z = L[i,j];
                    ret = new StringBuilder();
                }
                if (L[i,j] == z) {
                    ret.Append(s.Substring( i-z+1, z));
                }
            } else {
                L[i,j]=0;
            }
        }
    }
    return ret.ToString();
}
// With the LCS in hand, building the answer is easy
public static string CutLcs(string s, string t) {
    for (;;) {
        var lcs = FindLcs(s, t);
        if (lcs.Length < 10) break;
        s = s.Replace(lcs, string.Format("[{0}]", lcs.Length));
    }
    return s;
}
Sign up to request clarification or add additional context in comments.

Comments

1

You need to be very careful between "Longest common substring and "longest common subsequence"

For Substring: http://en.wikipedia.org/wiki/Longest_common_substring_problem

For SubSequence: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

I would suggest you to also see few videos on youtube on these two topics http://www.youtube.com/results?search_query=longest+common+substring&oq=longest+common+substring&gs_l=youtube.3..0.3834.10362.0.10546.28.17.2.9.9.2.225.1425.11j3j3.17.0...0.0...1ac.lSrzx8rr1kQ

http://www.youtube.com/results?search_query=longest+common+subsequence&oq=longest+common+s&gs_l=youtube.3.0.0l6.2968.7905.0.9132.20.14.2.4.4.0.224.2038.5j2j7.14.0...0.0...1ac.4CYZ1x50zpc

you can find c# implementation of longest common subsequence here:

http://www.alexandre-gomes.com/?p=177

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_subsequence

Comments

0

I have a similar issue, but for word occurrences! so, I hope this can help. I used SortedDictionary and a binary search tree

/* Application counts the number of occurrences of each word in a string
   and stores them in a generic sorted dictionary. */
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;

public class SortedDictionaryTest
{
   public static void Main( string[] args )
   {
      // create sorted dictionary
      SortedDictionary< string, int > dictionary = CollectWords();

      // display sorted dictionary content
      DisplayDictionary( dictionary );
   } 

   // create sorted dictionary 
   private static SortedDictionary< string, int > CollectWords()
   {
      // create a new sorted dictionary
      SortedDictionary< string, int > dictionary =
         new SortedDictionary< string, int >();

      Console.WriteLine( "Enter a string: " ); // prompt for user input
      string input = Console.ReadLine(); 

      // split input text into tokens
      string[] words = Regex.Split( input, @"\s+" );

      // processing input words
      foreach ( var word in words )
      {
         string wordKey = word.ToLower(); // get word in lowercase

         // if the dictionary contains the word
         if ( dictionary.ContainsKey( wordKey ) )
         {
            ++dictionary[ wordKey ];
         } 
         else
            // add new word with a count of 1 to the dictionary
            dictionary.Add( wordKey, 1 );
      } 

      return dictionary;
   } 

   // display dictionary content
   private static void DisplayDictionary< K, V >(
      SortedDictionary< K, V > dictionary )
   {
      Console.WriteLine( "\nSorted dictionary contains:\n{0,-12}{1,-12}",
         "Key:", "Value:" );

      /* generate output for each key in the sorted dictionary
        by iterating through the Keys property with a foreach statement*/
      foreach ( K key in dictionary.Keys )
         Console.WriteLine( "{0,- 12}{1,-12}", key, dictionary[ key ] );

      Console.WriteLine( "\nsize: {0}", dictionary.Count );
   } 
} 

Comments

0

This is probably dog slow, but if you're willing to incur some technical debt and need something now for prototyping, you could use LINQ.

string firstS = "123abc";
string secondS = "456cdeabc123";
int minLength = 3;

var result = 
    from subStrCount in Enumerable.Range(0, firstS.Length)
    where firstS.Length - subStrCount >= 3
    let subStr = firstS.Substring(subStrCount, 3)
    where secondS.Contains(subStr)
    select secondS.Replace(subStr, "[" + subStr.Length + "]");

Results in

 456cdeabc[3] 
 456cde[3]123 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.