0

I have this Dataset:

    NAME  VALUE1 VALUE2
0  Alpha     100     A1
1  Alpha     100     A1
2  Alpha     200     A2

I want to run a script that finds which patterns are in the dataset. For example in this particular dataset the rules it will find are:

1)IF NAME = ALPHA & VALUE1 = 100, THEN VALUE2 = A1

2)IF NAME = ALPHA & VALUE1 = 200, THEN VALUE2 = A2

I know that each column and row value will have to be compared like so...

ALPHA 100
ALHA 100
ALPHA 200

ALPHA A1 
ALPHA A1
ALPHA A2

100 A1
100 A1
200 A2

ALPHA 100 A1
ALPHA 100 A1
ALPHA 200 A2 

"ALPHA 100", can't be correct because "ALPHA 200" exists, same for "ALPHA A1" since "ALPHA A2" exists.

"100 A1" and "200 A2", are correct, but "ALPHA 100 A1", and "ALPHA 200 A2" are stronger variations and therefore are the ones printed out.

How could I go about this?

0

1 Answer 1

1

Okay, it is clasterisation task for each row. But i also want to find some sort of non-stochastical solutions for this. Like first, you may have hypothesis that there are all relations inside each row, like if alfa and 100 then a1, if alfa and A1 then 100, etc., as a condition you can take arbitrary amount of fields in the row.

Then, as you read next row, you update the rules. If you find a contradicting entry like alpha, 300 -> A1 now you use your generalization function. This may be alpha, 100 or 300 -> a1; or!!! alpha, interval (100 .. 300) -> A1. There is not general known approach for this, what makes it interesting. You might tell me exact task what are you doing, i would be interested in solving that

Sign up to request clarification or add additional context in comments.

3 Comments

I am trying to find the relationships in any relational dataset. So rules that are found will be specific to that dataset the script ingested. It makes sense how you say, if you find contradicting entry, then the rule is updated. Maybe we can discuss more over email so I can give you a better look.
email [email protected]. You may organize git repository or something. Finding relationship in any dataset is too broad for the begin - remember- we are searching for the method so it would be better if we narrow down to one specific problem. What i already figured out, i called it n-tuple agreement method and generaliaztion using commonsence ontology
I sent the email

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.