Nothing scares me more than the Python Class concept;
This is actually not a Python concept - classes exist in most object oriented languages.
and recently I have been trying (...) to understand their purpose, structure and features, etc. However, I am not clear about the concept of class
Before we talk about classes, you have to understand objects. An object is a way to group together a state (a set of data) and a behavior (a set of functions acting on the state or according to the state). Now this is a bit of an abstract definition so let's see how it works with a simple example - a geometric point in a 2d space.
For the state part, a 2d point is defined by it's x and y coordinates. You can represent this with a dict:
my_point = {"x": 0, "y": 0}
Ok, fine but not very explicit and a bit error prone. We can start with a function that is responsible for creating a new point:
def new_point(x=0, y=0):
return {"x": x, "y": y}
p1 = new_point()
p2 = new_point(42, 84)
Now we can build points without have to worry on the gory details. Ok, now let's a bit of behavior... A first useful function would be to check whether two points are equal (let's say they are equal if they have the same coordinates):
def points_are_equal(p1, p2):
return p1["x"] == p2["x"] and p1["y"] == p2["y"]
You can see that this behavior depends on both points states.
We could also want to move a point along the horizontal axis:
def move_x(p, distance):
p["x"] += distance
or along the vertical axis:
def move_y(p, distance):
p["y"] += distance
or both at the same time:
def move_by(p, x_distance, y_distance):
move_x(p, x_distance)
move_y(p, y_distance)
Notice that here, the behavior is to change the point's state.
And of course we want to have a way to get the point's x or y coordinates:
def get_x(p):
return p["x"]
def get_y(p)
return p["y"]
What we've built here is what is known as an "abstract data type": instead of manually building a dict, manually comparing two dicts, manually updating our dict and manually checking it's state, we have defined a set a function to do all this, more or less hiding the internal representation.
and how to create them.
A class is, mostly, another way to do the same thing, but with a lot of other goodness. Let's rewrite our "point" datatype as a Python class:
class Point(object):
# this is the function that creates a new point
def __init__(self, x=0, y=0):
self.x = x
self.y = y
# equality test:
def __eq__(self, other):
return self.x == other.x and self.y == other.y
# move
def move_x(self, distance):
self.x += distance
def move_y(self, distance):
self.y += distance
def move_by(self, x_distance, y_distance):
self.move_x(x_distance)
self.move_y(y_distance)
And we don't actually need to write get_x() nor get_y(), we can directly access x and y:
p = Point(2, 5)
print(p.x)
print(p.y)
p.move_by(3, 1)
print(p.x)
print(p.y)
p2 = Point(p.x, p.y)
print(p == p2) # => True
p2.move_x(3)
print(p == p2) # => False
Actually, behind the hood, our p object is a dict:
print(p.__dict__)
Other OOPLs might use other ways to store an object's state (structs for C-like languages for example), but in Python an object is actually mainly a dict. Well, a dict plus a class:
print(p.__class__)
and a set of "attribute lookup rules" (provided by the base class object) that will first lookup attributes on the object's __dict__ then on the object's class (which is how p.move_x(42) is actually interpreted as Point.move_x(p, 42).
Classes and objects provide a lot of other goodies (inheritance etc), but basically they are just this: a dict (which stores the state) and a class (which stores the behavior).
Now for your example:
my understanding is that when I initialize an object based on my Prob class, the file_contents are available for the class to use internally
file_contents is available for the instance - and the class functions can access it on the current instance - which is the self parameter. IOW, your build_prob function should use self.file_contents:
def prob_build(self):
self.problem, self.aux_vars = build_problem(self.file_contents)
Then you can access the problem and aux_vars on your instance:
first_object = Prob(some_file)
first_object.prob_build()
print(first_object.problem)
print(first_object.aux_vars)
Just note that the problem and aux_vars attributes only exist after you called prob_build. This is considered as bad practice, since you can get an AttributeError :
first_object = Prob(some_file)
# Doesn't work !!!
print(first_object.problem)
A first step to fix this would be to initialize those attributes in the __init__ method (yes, that's why it's called "init"):
class Prob(object):
def __init__ (self,filename):
self.file_contents = read_file(filename)
self.problem = None
self.aux_vars = None
def prob_build(self):
self.problem, self.aux_vars = build_problem(self.file_contents)
but that's hardly better - you still need to call yourobj.prob_build() to have a usable state. The obvious fix here is to do all the initialization in the initializer and get rid of prob_build:
class Prob(object):
def __init__ (self,filename):
self.file_contents = read_file(filename)
self.problem, self.aux_vars = build_problem(self.file_contents)
but then you can ask yourself: what's the point of this class if it has no behavior, and all you do is:
prob = Prob("path/to/file.csv")
prob, aux_vars = prob.problem, prob.aux_vars
result = do_something_with(prob, aux_vars)
You could as well replace it with a simple function:
def build_problem_from_file(path):
return build_problem(read_file(path))
prob, aux_vars = build_problem_from_file(...)
result = do_something_with(prob, aux_vars)
As a general rule, if your class as either no state or no behavior, chances are you don't need a class. There are exceptions to this rule of course but this is still a good guideline. In your case, the hypothetical do_something_with(prob, aux_vars) might be a method too:
class Prob(object):
def __init__ (self,filename):
self.file_contents = read_file(filename)
self.problem, self.aux_vars = build_problem(self.file_contents)
def do_something(self):
# some computations here using self.problem and self.aux_vars
return result
prob = Prob("path/to/file.csv")
result = prob.do_something()
but if that's the only behavior, you still don't need a class:
def build_problem_from_file(path):
return build_problem(read_file(path))
def resolve_problem_from_file(path):
prob, aux_vars = build_problem_from_file(...)
return do_something_with(prob, aux_vars)
result = resolve_problem_from_file(...)
So to make a long story short: ask yourself if and why you want a class. OOP is a good solution for some problems but is not the solution to all problems.
selfbe the first argument so that the method can access attributes of the instance. In your example, you would need to dobuild_problem(self.file_contents). This makes sense: You defineself.file_contents, so you must accessself.file_contents. Your problem is arising becauseprob_builddoes not return anything, so it implicitly returnsNone.build_problem(self.file_contents)gives meinvalid syntaxerror in Spyder, whereasbuild_problem(self)produces an error message,NameError: global name 'file_contents' is not definedfile_contentsthat hasn't been given a value. Are you doingfile_contents = ...somewhere outside your class?file_contentsoutside of the class,first_object.prob_build()works because it take the value offile_contentsfrom the global namespace, I think. Also, I added areturn (self.problem, self.aux_vars)to the 'prob_build' method, so theTypeErrordoes not appear anymore.