7

Consider you have several patterns of dates P1 - Pn.

Some of them are simple like P1 - all Mondays, P2 - all Tuesdays; others are more complex like P4 - all working days etc.

For custom array of dates (V1, V2) I have to create the shortest result string, as it is shown on the picture:

Multi Pattern Matching

For any array we have to create string which will represent dates in array. The simplest method is to create string like 1.5.2013, 2.5.2013, 3.5.2013 ... But the result string will be very long.

Using several predefined patterns we can create shorter result string.

For result string I use following rules:

Single date format: DD.MM.YYYY (10 characters)
Enumeration (dates and patterns): comma and space (2 characters)
Interval of dates: DD.MM.YYYY-DD.MM.YYYY (21 characters)
Interval of pattern names: Px-Py (5 characters)
Special words: except (6 characters)

Examples of result strings:

  • V1 using P4 pattern:

    P4 except 01.05.2013-03.05.2013, 09.05.2013, 10.05.2013, 16.05.2013, 17.05.2013 (80 characters)

  • V1 using Pn pattern:

    Pn 06.05.2013-08.05.2013, 13.05.2013-15.05.2013, 20.05.2013-24.05.2013, 27.05.2013-31.05.2013 (94 characters)

  • V1 using best patterns match:

    P1-P3 01.05.2013-19.05.2013, P4 20.05.2013-31.05.2013 (54 characters)

The main goal is to create the shortest result string. As I understand we can achieve this by finding the best matching pattern/patterns.

Currently I'm trying to adapt knapsack problem and longest common subsequence problem, but I'm not sure if it is the right direction.

I would appreciate any ideas.


updated

Thanks to Jan Dvorak for his extra short description of my problem:

The goal is to describe V using a predefined dictionary (P1..Pn and all intervals and single dates) where intersection, union and subtraction are all allowed and each operation and atom have a predefined cost (number of characters in result string).


13
  • 4
    so, your goal is to describe V using a predefined dictionary (P1..Pn and all intervals and single dates) where intersection, union and subtraction are all allowed and each operation and atom have a predefined cost? Commented May 26, 2013 at 11:45
  • 1
    Exhibit one: P1-P3 from 1.5.2013 to 19.5.2013, P4 from 20.5.2013 to 31.5.2013 (64 characters); Exhibit two: P4 except 9.5.2013, 10.5.2013, 16.5.2013 and 17.5.2013 (54 characters). Is the latter allowed? If not, why? Commented May 26, 2013 at 11:51
  • 1
    to Jan Dvorak: Your definition is correct.About Exhibit two: I've made a mistake in post in second example, I've used P4, but ment Pn. V1 using P4 pattern is: P4 except 1.5.2013-3.5.2013, 9.5.2013, 10.5.2013, 16.5.2013 and 17.5.2013 (74 chars) which is longer then P1-P3 from 1.5.2013 to 19.5.2013, P4 from 20.5.2013 to 31.5.2013 Commented May 26, 2013 at 12:06
  • 2
    Looking at your date format, your date literals don't have a fixed cost. They can be from 8 to 10 characters depending on whether the day and month are less than 10 or not. Also your date ranges are represented differently for an except ("date1-date2") vs an inclusion ("from date1 to date2"). And while the former combines date ranges with a comma, the latter combines date ranges with " and ". The former also supports a single date, while the latter doesn't appear to do so. A grammar showing exactly what string formats are permitted would make this a whole lot clearer. Commented May 27, 2013 at 11:41
  • 3
    I can't speak for anyone else, but personally I find it difficult to come up with a solution without knowing the full requirements of the problem. Perhaps the exact details wouldn't matter in the end as far as the general approach is concerned, but I find it difficult to work on a hypothetical problem with unknown parameters. Maybe that's just me though. Commented May 27, 2013 at 20:43

2 Answers 2

1

After long time of searching we finally found solution which is very close to what we want.

http://www.fpz.unizg.hr/traffic/index.php/PROMTT/article/view/1287

Thanks for all how participated.

Sign up to request clarification or add additional context in comments.

Comments

0

This is just a suggestion but if you want a really short string that represent the arrays of dates, you could solve this problem in a totally different way, this way is very simply and efficient.

Let 1 represent a day "seleceted" and let 0 represent a day "unselected", then you can construct a binary number that represent the custom date arrays in a month, for example, for the V1 case you can generate this binary number:

V1 = 0000011100001110000111110011111

So the first 0 represent that the date 1.5.2013 is "unselected", the next 0 represent that the date 2.5.2013 is "unselected", etc. If you separate this number in 8 bits groups (dividing the binary number in bytes) then you can create this byte array:

V1(starting in May 1, 2013) = 00000111 - 00001110 - 00011111 - 00111110  (4 bytes)

With this method you are representing the V1 using just 4 bytes, this is the only info you need if you know that V1 start on the date 1.5.2013, so you need to store the initial date as well, so you can represent the month and year using just 3 bytes, so for instance the May 2013 date can be represented in this way:

May = 5th month so 5 in binary is 101

2013 in binary is 11111011101 So using 3 bytes you can represent May 2013 as this:

0000101 00000111 11011101
[  5  ] [     2013      ]

So you can represent V1 as this

V1= 0000101 - 00000111 - 11011101  00000111 - 00001110 - 00011111 - 00111110
    [Month]   [     Year        ]  [      V1 custom array of dates         ]

So V1 can be totally represented using just 7 bytes!!

If you need a String instead of a byte array, then you can convert this byte array into a Base64 String so V1 can be represented as the string

V1 in Base64 is Cg+6Dhw+Pg== (using just 12 characters!!)

In the case of V2:

V2 = 0000101 - 00000111 - 11011101  11111111 - 11111111 - 11111111 - 11101110
     [Month]   [     Year        ]  [      V2 custom array of dates         ]

V2 in Base64 is Cg+7////bg== (using just 12 characters again!!)

With this method you know that a month custom array of dates info can be represented in 7 bytes (or 12 characters if you use the base 64 String).

To store the custom array info in a year you just need: 3 bytes for the start month and year, plus 365/8 = 45.625 (rounded to 46 bytes), that is 49 bytes!! for the whole year, that in base 64 has a maximum length of 69 characters!!!

This is simple to implement, easy to maintain in code, better than a complex pattern matching algorithm, this smell like a good solution to me. I hope that this recommendation fits your requirement.

1 Comment

Thank you, we use very similar way to store data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.