Ownership for Fortran pointers

PierU · May 27, 2024, 12:27pm

xxx, dimension(:), pointer :: a, b, c`
xxx, dimension(:), allocatable, target :: d

allocate( a(n) )

An internal table is created, the key entry being the address range that is allocated for a(1:n), and the elements of the tables are the pointers that point to this range (full or partial range). So at first the table contains [a]

b => a

b is added to the table, which is now [a,b]

c => b(1:n:2)`

c is added to the table, which is now [a,b,c]

a definitely goes out of scope (whatever the reason)

the table is now [b,c]

b => d

b no longer points to the range, the table is now [c]

c definitely goes out of scope

the table is now empty [], so the allocated range can be freed.

This looks backward compatible to me… The management of the tables has a runtime cost, though, so this behavior could be enabled with a new attribute:

xxx, dimension(:), pointer, safe :: a, b, c

certik · May 27, 2024, 5:56pm

@hkvzjal I see. So the difference is that the allocatable attribute selects the “owner” at compile time, while you need a feature where the owner gets selected at runtime, correct?

How would this be done in C++? That would help me understand better the use case. I am familiar with std::unique_ptr (similar to our allocatable), and std::shared_ptr (reference counted pointer), as well as various other smart pointers. But none of them quite seem to fit this use case.

@PierU, I would have to see more of your code, but I would probably always treat pointer as non-owning. If you receive data, just put it there. If you have your own data, I would use allocatable elsewhere and load my own data, but then just take a non-owning pointer to it and process it using your existing infrastructure that would all assume a non-owning pointer.

PierU · May 27, 2024, 7:10pm

I think that the C++ shared_ptr would do. With them, it’s actually beyond deciding which pointer is the owner, as the pointed memory is deallocated once inaccessible.

It would indeed be another solution.

certik · May 27, 2024, 7:56pm

Ok, so a feature for a reference counted pointer is indeed something to consider. I am surprised I didn’t find it in our proposals repository yet, so I just created an issue for it: Reference counted pointer · Issue #334 · j3-fortran/fortran_proposals · GitHub.

hkvzjal · May 27, 2024, 7:58pm

In a sense yes, practically speaking, I’ll be declaring one instance of the type which is owner of the data, and another instance in which I point to pre-existing data in external modules. Both cases would be known at compile-time, but it could happen that after clearing the object, it could be reused.

certik · May 27, 2024, 9:37pm

@hkvzjal I see. It seems it’s a little bit like this thread: Rank a compile time concept vs a run time concept?. In general as a guiding principle, we want to add strong compile time features, not “interpreted” runtime features. In this case it seems to me you want a special kind of smart pointer, that in C++ would decide at runtime if it is std::unique_ptr or just a raw pointer (or a weak pointer). So in C++ I would implement a custom smart pointer for that. I am still not sure this is a good design though.

Another way to look at this is from performance perspective: if this kind of code doesn’t need to be the absolute best performance, then even using the reference counted pointers would be ok.

RonShepard · May 28, 2024, 1:43am

For optimal performance in fortran, one usually doesn’t want to reference the pointers directly in expressions. In this case, the compiler must account for possible aliases within the expression or between expressions (like pointers in C). Instead, you can use the pointers as actual arguments, associated with nonpointer dummy arguments, and then the compiler can optimize expressions with those dummy arguments. In the newer fortran versions, the programmer might be able to do the same thing with associate blocks, and this would avoid the overhead associated with the actual subroutine call.

For this “ownership” issue, here is another possible approach. The programmer could define the derived type that has an allocatable component a(:) and a pointer component pa(:). For the instances where the variable x owns the data, it would allocate through x%a(:) and assign the pointer x%pa=>x%a. For the instances where the variable does not own the data, x%a(:) would remain unallocated and x%pa would point to wherever the data is. Then when the variable is used, x%pa would always point to the data, regardless of where it actually is.

hkvzjal · May 28, 2024, 5:25am

There is no performance issue here. The few extra click-counts needed for managing the pointer(s) and the few extra bytes for the DT just pail next to the fact that each pointer will be holding KBs up to GBs worth of data. And the crunching procedures afterwards are designed using plain old arrays. The DT is needed to managing passing around the data in a compact and clean manner during large scopes of execution and not just local scopes.

The one thing I would like to see is not having to manually add this boolean owner flag to the DT but that the language enables robust data destruction when the pointer (or DT holding it) goes out of scope whether it is just pointing or holding the actual data.

RonShepard · May 28, 2024, 7:26am

Yes, this is what I was trying to describe, dummy arguments that are regular arrays where the actual floating point operations are done.

There may be several ways to do this within the language already. My suggestion above with x%a(:) and x%pa(:) is one. No extra boolean flag is needed because the compiler knows the “owner” is the one with the allocated x%a(:). It is deallocated automatically when the variable goes out of scope. There are other possible ways to address this too, for example with finalizers of the derived type. Another way is to allocate the actual data in a separate data structure, say a linked list, or some other data storage structure appropriate for your problem, then then have the x%pa(:) pointers point to the arrays within the linked list as appropriate. Deallocation of the linked list can be done with a single statement (the compiler will handle the recursion itself), or it will be automatically deallocated when the list goes out of scope. There are lots of possibilities within the existing language.

everythingfunctional · May 29, 2024, 8:03pm

This discussion seems to be conflating different problems and solutions.

There are a couple of dimensions to the issue. Who is allowed to modify and/or view the data? When is the data accessible? Where does the data live (stack or heap)?

Variables without the target attribute solve this problem by: the data lives on the stack, it’s only accessible during execution of the block it’s declared in, and there’s only one variable allowed to modify or view the data.

Allocatable variables without the target attribute are the same, but the data lives on the heap instead. C++ unique_ptr has the same properties.

The above situations do not require an “ownership” model, as there is only one way to access the data so the “owner” is clear.

Variables with the target attribute still have a clear “owner” and “lifetime”, but there are no restrictions on who’s allowed to modify or view the data. The part that makes this situation tricky is that any pointers don’t really know that the thing they’re pointing at definitely has ownership and a defined lifetime that may be shorter than the pointer. C++ raw pointers have this same problem, and there are not restrictions on taking a reference to something in C++ (everything implicitly has the target attribute).

So what are some existing solutions to solving this problem of shared data?

Nobody “owns” the data, we just clean it up once nobody is using it anymore through garbage collection or reference counting. There are a lot of modern languages that take the former approach, and C++ shared_ptr takes the latter. The Fortran language doesn’t have either of these options currently, but there is a library (referenced earlier in the thread) that aims to provide a reference counting approach.

There is one owner, and we enforce some properties about that at compile time. Rust takes this approach. By construction there is only ever one owner of the data. Readonly references to the data can be shared, but the owner cannot modify the data while these are active, and ownership can be transferred, but only when there are no other references to the data. If we leave the scope of some data’s owner, the data is cleaned up, and there cannot be any shared references to the data that are not in that same scope. All of this is enforced at compile time through some pretty clever analysis.

My recommendation would be to try and think about your use case in terms of one of those two models.

PierU · May 30, 2024, 8:18am

Here is what I wrote on Reference counted pointer · Issue #334 · j3-fortran/fortran_proposals · GitHub :

I was rather thinking about something inspired by shared_ptr, but a bit different:

There would be a variant of the current pointers, say a “safe pointer” (e.g. integer, pointer, safe :: a(:))
A safe pointer can be associated to any pointer (safe or not) or target object
A non-safe pointer can not be associated to a safe pointer
A safe pointer associated to a non-safe pointer or to a target object behaves exactly like a non-safe pointer
If a safe pointer is being allocated:
- if this pointer or any of the associated safe pointers is nullified, or goes out of scope, or is associated to another object:
  - if no other safe pointer was associated to it, the memory is released
  - if other safe pointer(s) was(were) associated to it, the memory is not released
  - (this is classical reference counting)
- if this pointer or any of the associated safe pointers is explicitly deallocated, the memory is released and all the associated safe pointers are nullified
  - (I don’t know how difficult it would be for compilers to implement that)

#define ASS associated
integer, allocatable, target :: a(:)
integer, pointer, safe :: s1(:), s2(:), s3(:)


a = [42, 0, -42]
s1 => a
s2 => s1
s3 => s1(1:n:2)
print*, ASS(s1),  ASS(s2),  ASS(s3), s1(1)  ! should be "T T T 42"

deallocate( a )
print*, ASS(s1),  ASS(s1),  ASS(s3)  ! undefined


allocate( s1(3), source=[42, 0, -42] )
s2 => s1
s3 => s1(1:3:2)
print*, ASS(s1),  ASS(s2),  ASS(s3), s3(1)  ! should be "T T T 42"

s1 => null()
print*, ASS(s1),  ASS(s2),  ASS(s3), s3(1)  ! should be "F T T 42"

s2 => null()
print*, ASS(s1),  ASS(s2),  ASS(s3), s3(1)  ! should be "F F T 42"

s3 => null()   
! memory released at this point, as it is now inaccessible          
print*, ASS(s1),  ASS(s2),  ASS(s3)  ! should be "F F F"


allocate( s1(3), source=[42, 0, -42] )
s2 => s1
s3 => s1(1:3:2)
print*, ASS(s1),  ASS(s2),  ASS(s3), s3(1)  ! should be "T T T 42"

deallocate( s2) ! could be s1, but not s3
                ! the memory is released
print*, ASS(s1),  ASS(s2),  ASS(s3)  ! should be "F F F"

everythingfunctional · May 30, 2024, 2:43pm

This proposal would help alleviate memory leaks, but doesn’t help avoid use after free errors.

PierU · May 31, 2024, 6:31am

What do you mean?

everythingfunctional · May 31, 2024, 2:29pm

For example:

integer, pointer, safe :: p
call allocate_it
print *, p
call associate_it
print *, p ! invalid to reference p now
contains
subroutine allocate_it
allocate(p)
p = 42
end subroutine
subroutine associate_it
integer, target :: t
t = 42
p => t ! the previously allocated p will be deallocated?
end subroutine
end

As described, your proposal does not ensure this error gets detected at compile time, but at run time the reference to p in the second print statement is still invalid. And actually, now that I look at it, a small change to the above example actually reintroduces the memory leak.

integer, pointer, safe :: p
call allocate_it
print *, p
call associate_it
print *, p ! invalid to reference p now
contains
subroutine allocate_it
integer, pointer :: p2
allocate(p2)
p2 = 42
p => p2
end subroutine
subroutine associate_it
integer, target :: t
t = 42
p => t ! p wasn't allocated as "safe", so memory leak still here
end subroutine
end

PierU · May 31, 2024, 3:15pm

That’s correct, because in associate_it the safe pointer p is associated to something else than another safe pointer, and I wrote “A safe pointer associated to a non-safe pointer or to a target object behaves exactly like a non-safe pointer”.

This is “by design”

The underlying idea was to have the same properties as the C++ shared_ptr, but still allowing pointing to something else than safe pointers. The use case being the initial need expressed in the thread.

real, pointer, safe :: a(:)

if (already_allocated_memory) then
   a => some_already_allocated_memory
   ! the memory management is the responsability   of the code that
   ! allocated the memory
else
   allocate( a(n) )
   ! no need to bother deallocating and keeping track of a `is_owner` flag,
   ! the memory will be deallocated once `a` is out of scope anyway
end if

But I admit that this is confusing to have a safe attribute that hides a “sometimes not safe” behavior, and that it’s probably not a good idea.

FortranFan · June 1, 2024, 1:47pm

“safe” in this context (and many other semantic and technical contexts that I won’t get into for the sake of brevity) is utterly a weasel word and one can only hope a mere working contractor such as J3 for the ISO working group (WG5) for Fortran will always gain the sense from now on to stay away from such words for the benefit of this language and its practitioners.

Based on many communications by different Fortranners on various platforms (including comp.lang.fortran) and discussions (including at J3 working meetings) across many years and including this thread initiated by @hkvzjal , it appears the practitioners of Fortran can really benefit from a language standard revision which introduces a notion of pointer-spec that can be envisioned as somewhat analogous, at least syntactically, to intent-spec (INTENT(IN), INTENT(IN OUT), etc.).

Thus, with a pointer-spec which may take certain labels, let’s just call them here for the sake of keeping the discourse open as thing one, thing two, thing three

Rnnn pointer-spec is thing_one
                  or thing_two
                  or thing_three

Now, if I had my wish, I would term thing_one as ALIAS; thing_two as CONST_DATA; and thing_three as CONST_REF where:

ALIAS implies the object with the pointer attribute in an alias to another object (one that is a target), thus it is not the “owner” of the data. Thus the object with ALIAS spec cannot appear in ALLOCATE and DEALLOCATE statements. Note this is in effect close to or same as the desired facility by OP in this thread,
CONST_REF implies the pointer object cannot appear in any pointer association context other than initialization. Note this is what is in effect with INTENT(IN) and POINTER in the current standard but which has proved both confusing and inadequate for many Fortranners.
CONST_DATA implies the pointer object cannot appear in any variable definition context other than initialization.

The standard may then specify suitable semantic requirements, preferably in the form of numbered constraints, connected with pointer-spec that enable Fortranners to consume the POINTER attribute with significantly reduced vulnerabilities i.e., make the use of pointers effectively more “safe”.

So with a feature set along such lines, a Fortranner may author code such as

  ..
  real, target :: some_data(2,3)
  real, pointer( alias, const_ref ) :: pdata(:,:) => some_data  !<-- now pdata shall not appear in any pointer association context including `ALLOCATE` and `DEALLOCATE` statements
  ..

which then restricts the object pdata from being allocated or deallocated and from being associated with other targets.

And in case like with OP here who seeks to work with data “containers” (what is referred to as mytype in the original post, the data “container” may possibly be defined as

   type :: data_container_t
      ..
      real, allocatable :: data(:,:)
      real, pointer( alias ) :: pdata(:,:)
      ..

which will then allow the pdata component to point to targets that may be data sources elsewhere (possibly remote). pdata may even point to the data component itself. In effect, pdata shall serve as how pointers were intended in Fortran which is primarily as aliases. But the point to note is the language directly allows an indication pdata is not the “owner” of the data and it is clear to the author any finalization of such a container need not ponder over the question as to whether to deallocate the memory holding the data.

But Fortranners have also expressed a need in certain contexts to use aliases to immutable data. This appears another use case that the language should try to serve. This can also possibly allow pointers to associate with named constants that are targets. This will be useful for Fortranners, from what I have come across from feedback, with safely working with constant data, a key aspect in certain coding instructions present in scientific and technical computing e.g.,

   integer, parameter, target :: foo(*) = [ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, .. ]
   ..
   integer, pointer ( alias, const_data, const_ref ) :: bar(:) => foo(1:10)
   ..

TL;DR:

Do not use weasel words like “safe”, “concurrent”, “REAL64” etc. with Fortran features. To borrow from Wikipedia, these create “an impression that something specific and meaningful has been” added to the language, “when in fact only a vague, ambiguous, or irrelevant feature has been” introduced
Within the context of this thread with POINTERs in Fortran, Fortranners would benefit if the Community can think more broadly and develop something that introduces a set of facilities which make the use of pointers less vulnerable than at present in some specific and stated issues, like I explain above.

certik · June 5, 2024, 4:29pm

@hkvzjal and I discussed this issue over video. I understand the use case a lot better now. Here is what we figured out so far:

In Fortran you can use allocatable arrays for owning pointers (and use allocate to allocate memory, and it gets freed automatically), or pointer arrays for non-owning pointers. Both are consistent, and with good compiler Debug time checks for dangling pointers, both are great building blocks, that are safe (in Debug or ReleaseSafe modes) and can’t leak memory.
But you can also use pointer and allocate memory in it with allocate. This feature is not consistent/clean, because it makes the pointer owning, but there is no facility to automatically free the memory. Here are several approaches how to make this feature “clean”:
- Disallow using allocate on pointers. This is a consistent choice, but there was a reason this feature was added to the language: it adds flexibility. So let’s explore the other approaches as well.
- Automatically deallocate the memory in pointer when it goes out of scope, if it was allocated using allocate. It effectively makes the pointer an allocatable. That is the proposal in this thread. There are many ways how this can work, I just list some, but there are more:
  - an optional argument to allocate to turn on automatic deallocation; encourage to use this in new codes, and eventually deprecate the non-automatic (default) pointer behavior
  - always deallocate (this is not backwards compatible and would break some existing code, so probably we don’t want that)
  - extra attribute at declaration that turns on automatic deallocation for pointer, so any allocate to it will make it “owning” and automatically deallocate;
- Use reference counted pointers (Reference counted pointer · Issue #334 · j3-fortran/fortran_proposals · GitHub), say with some attribute at declaration. Encourage to use reference counted pointers instead of pointer and allocate.

I don’t have an opinion which approach is the best. But I think that pointers can (and should) be improved along the lines above.

PierU · June 6, 2024, 9:00am

I think that the problem described by @hkvzjal is the best solved by using an additional allocatable object, as suggested by several contributors along the line:

module test

type :: mytype
    real, pointer :: x(:) => null()
    private
    real, allocatable :: y(:)
contains
    procedure :: clear => destructor
end type

interface mytype
    module procedure :: constructor
end interface

contains

type(mytype) function constructor(sze,rbuff) result(p)
    integer, intent(in) :: sze
    real, intent(in), target, optional :: rbuff(:)

    if(present(rbuff))then
        p%x => rbuff(1:sze)
    else
        allocate(p%y(sze), source=0.0)
        p%x => p%y
    end if
end function

subroutine destructor(self)
    class(mytype) :: self

    if(allocated(self%y)) deallocate(self%y)
    self%x => null()

end subroutine

end module

program main
use test

type(mytype) :: A, B
real, allocatable :: y(:)

y = [(real(i), i =1 , 6)]
A = mytype(6,y) !> A%y is not allocated

print *, A%x(:)
A%x(3) = A%x(3)**2
print *, y(:)
call A%clear !> A%x is nullified 
print *, y(:)

B = mytype(6) !> B%y is allocated
print *, B%x(:)
call B%clear() !> B%y is deallocated and B%x is nullified

end program

Compared to the solution with %imowner, the %y component benefits from the automatic deallocation, so there’s no need to write a FINAL procedure.

I think that proposals with pointers that can sometimes own the data, but not always, would be confusing at the end.

That said, having reference counted pointers could be useful, but maybe more as an extension of the current allocatable objects.

mEm · June 6, 2024, 9:42am

Good solution, but I think that in general a workaround like this one, as well as the one originally suggested, defeats the way a user would normally use pointers/allocatables, i.e. being able to declare a variable and directly index it, without having to type that extra %x[...] each time using it.
I think the only sustainable way to achieve such “safety” is by means of compiler assistance, where the user simply “tweaks” something so that it might turn on/off some behaviour, such as deallocating pointer variables. Indeed, no tweaking would keep things “as is” ensuring backward compatibility.

FortranFan · June 6, 2024, 1:40pm

This solution does not conform to the standard; the reference to A%x(:) in the snipped PRINT instruction is nonconforming as A%x is undefined once the so-called constructor function completes execution.

The issue here originates from the missing TARGET attribute of object y.

And here the standard has effectively left the onus entirely on the practitioner to conform; truly speaking, no assistance is forthcoming from the processor in this regard, particularly given the so-called automatic targetting facility in the standard (readers may refer to Modern Fortran Explained by Metcalf et al. for details).

This is the TRAGEDY of the “modern” Fortran standard that I repeatedly highlight for the readers - gaping holes have been left to great detriment for practitioners. Readers such as @aradi have raised so many threads online inquiring about this and very lengthy discussions have taken place at least for a decade going back to comp.lang.fortran in this regard but little has been done to address this in the standard - sad.

At this stage, really the only solution for the long-term from a standard perspective is the pointer-spec option I mentioned upthread. With this, suitable semantics can indeed be developed whose manifestation in practice can be a Fortranner authoring a constructor like so:

..
type(mytype) function constructor(sze,rbuff) result(p)
    integer, intent(in) :: sze
    real, intent(in), pointer( const_ref ) :: rbuff(:)  !<-- note the `pointer-spec`
    ..
    p%x => rbuff(1:sze)
   ..
end function
..

for which - with the help of a new CONST_REF type of pointer-spec, the consumer can be guided, hopefully with a numbered constraint in the standard and thus gain processor detection and reporting, to provide an actual argument with certain attributes - in the simplest instance it will be ALLOCATABLE, TARGET e.g.,

..
real, allocatable, target :: y(:)  !<-- note the TARGET attribute here

y = [(real(i), i =1 , 6)]
A = mytype(6,y) !> A%y is not allocated
..

Topic		Replies	Views
Understanding Fortran pointers Help	13	11942	May 3, 2021
Trying to understand why my pointer is 'losing' its value Help	20	813	November 30, 2023
Deallocation of arrays passed as c_ptr Help	23	490	January 23, 2024
Dummy pointers to actual non-pointers allocatable targets Help	16	597	February 19, 2023
Traits/interfaces in Fortran?	21	1649	September 14, 2023

Ownership for Fortran pointers

Related topics