3

I have the following code for writing to a binary file:

CALL system_clock(Time1, rate)
OPEN( 1, FILE='Test.bin', STATUS='UNKNOWN', ACCESS='STREAM')

DO 275 I=1,NDOF
  DO 274 J=1,UBW
    IF (S(I,J).NE.0) THEN
      WRITE (1) I
      WRITE (1) J+I-1
      WRITE (1) (S(I,J))
    ENDIF
  274 CONTINUE
275 CONTINUE

CLOSE(1)
CALL system_clock(Time2)
print *, "elapsed time: ", real(Time2-Time1) / real(rate)

I know by using less WRITE statement I can make it faster. So inside the loop I am using the following code and it is faster:

IF (S(I,J).NE.0) THEN 
WRITE (1) I, J+I-1,  (S(I,J)) 
ENDIF

Is there any way to get rid of the loop (since it is time consuming) or make any other change to have a more efficient code?

Please note that I want to have the order of I, J+I-1 and S(I,J) (only non zero values) in my writing. Also since I am using a C++ program to read the binary file I have to use stream access.

Any suggestions are greatly appreciated.

1
  • if you're trying to implement a kind of sparse format, it might be more efficient to lay down e.g. 'J' and the number of non-zero rows, and then only write 'I' and 'S(I,J)' for those rows (or using some zero special field ). Commented Apr 12, 2013 at 7:36

3 Answers 3

3

One thing you can do is flip the order in which the array is being processed. So in your do statements simply switch i with j. This is because the array S(i,j) is two dimensional and the way it is stored in memory is very important for access speed. The storage depends on the programing language standard in being used, Fortran (as opposed to C, but like Matlab), uses array storage in column-major form. Therefore the most efficient way to access the memory is to traverse each column of the array in succession starting with the element in the first row.

Sign up to request clarification or add additional context in comments.

Comments

2

What I do in my production code is to first fill a buffer and then store it in one write command. It is much faster than write only columns or much much faster than write individual values. Something along the lines of:

CALL system_clock(Time1, rate)

allocate the buffer with the sufficient size
offset = 0
buffer = 0

DO I=1,NDOF
  DO J=1,UBW
    IF (S(I,J) /= 0) THEN
      buffer(offset: offset + int_size-1) = transfer(I,buffer)
      offset = offset + int_size
      buffer(offset: offset + int_size-1) = transfer(J+I-1,buffer)
      offset = offset + int_size
      buffer(offset: offset + real_size-1) = transfer((S(I,J))
      offset = offset + real_size
    ENDIF
  end do
end do

OPEN( 1, FILE='Test.bin', STATUS='UNKNOWN', ACCESS='STREAM')

write (1) buffer(1:offset-1)

CLOSE(1)
CALL system_clock(Time2)
print *, "elapsed time: ", real(Time2-Time1) / real(rate)

Really, the order of traversing of the array is not so much important. The I/O operations is what slows you down enormously.

P.S. Please do not end your loops with continue in Fortran 2003, it hurts.

5 Comments

its unfortunate we should need to do this. write should be simply transfering the data to an os buffer, why should copying the data twice be beneficial?
Because usually there is the whole stack of I/O routines invoked with every item in the I/O list. See for example ibm.com/developerworks/mydeveloperworks/blogs/…
You may try -assume buffered_io for ifort or equivalent for others and see if it helps, but doing this yourself is not THAT difficult.
thanks, I hadn't thought about locking the unit. Anyway I'm not concerend about difficulty, the concern is if your compiler actually does a good job of optimising i/o you could actually hurt performance both by copying data twice and using extra memory.
It is definitely not the case for compilers I have (gfortran, Intel, Oracle) and also not for IBM as you can see in the reference.
0

Another thing, you could do, is something I would call manual loop unfolding.

CALL system_clock(Time1, rate)
OPEN( 1, FILE='Test.bin', STATUS='UNKNOWN', ACCESS='STREAM')

DO 275 I=1,NDOF
  DO 274 J=1,UBW,2
    IF (S(I,J).NE.0) THEN
      WRITE (1) I, J+I-1, S(I,J), I, J+I, S(I,J+1)
    ENDIF
  274 CONTINUE
275 CONTINUE

CLOSE(1)
CALL system_clock(Time2)
print *, "elapsed time: ", real(Time2-Time1) / real(rate)

The option to change the order of the loop could be considered (in Fortran the leftmost index should be changed fastest because of column-major storage of matrices) but I think the speedup should be negligible to the time, which is needed for I/O.

EDIT: In case, UBW is odd, you should make sure, that J doesn't get so far to avoid an array-out-of-bounds error.

...
IF ((S(I,J) .NE. 0) .AND. (J .LT. UBW)) THEN
  WRITE (1) I, J+I-1, S(I,J), I, J+I, S(I,J+1)
ELSE
  WRITE (1) I, J+I-1, S(I,J)
ENDIF
...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.