0

I need your help in reading text file in php the first part of the file consists of variables and the second part consist of data as multiple rows, each row limits is 79.

I want to read the data and store them in mysql db.

The file is epiData rec file

The file structure as below:

44 1
_COUNTRY       1   3  30  57   3   3   3 112 COUNTRY         Country ............................... 
_IDCODE        1   4  30  57   4   3  20 112 IDCODE          EPID No................................ 
#HOTCASE       1   5  30  57   5   0   1 112 HOTCASE         Hot case............................... 
_DISTRICT      1   6  30  57   6   3  20 112 DISTRICT        District............................... 
_PROVINCE      1   7  30  57   7   3  20 112 PROVINCE        Province............................... 
_DOB           1   8  30  57   8  11  10 112 DOB             Date of birth.......................... 
#AGE           1   9  30  57   9   0   3 112 AGE             Age (in months)........................ 
#SEX           1  10  30  57  10   0   1 112 SEX             Sex.................................... 
_DONSET        1  11  30  57  11  11  10 112 DONSET          Date of onset of paralysis............. 
_DNOT          1  12  30  57  12  11  10 112 DNOT            Date of notification................... 
_DOI           1  13  30  57  13  11  10 112 DOI             Date of case investigation............. 
_DSTCOLL1      1  14  30  57  14  11  10 112 DSTCOLL1        Date stool collected:1................. 
_DSTCOLL2      1  15  30  57  15  11  10 112 DSTCOLL2        Date stool collected:2 ................ 
#DOSESR        1  16  30  57  16   0   1 112 DOSESR          Routine doses of OPV................... 
#DOSESN        1  17  30  57  17   0   2 112 DOSESN          Doses of OPV during NID/SIA............ 
#DOSES         1  18  30  57  18   0   2 112 DOSES           Total polio doses...................... 
_DLOPV         1  19  30  57  19  11  10 112 DLOPV           Date of last OPV....................... 
#FEVER         1  20  30  57  20   0   1 112 FEVER           Fever.................................. 
#PROGRESS      1  21  30  57  21   0   1 112 PROGRESS        Progression............................ 
#ASYM          1  22  30  57  22   0   1 112 ASYM            Asymmetric paralysis................... 
_DFUP          1  23  30  57  23  11  10 112 DFUP            Date of follow-up...................... 
#FUP           1  24  30  57  24   0   1 112 FUP             Findings at follow-up.................. 
_DSTLAB        1  25  30  57  25  11  10 112 DSTLAB          Date stool(s) received in lab.......... 
_DTRES         1  26  30  57  26  11  10 112 DTRES           Date prelim results received by EPI.... 
_DIRES         1  27  30  57  27  11  10 112 DIRES           Date ITD results received by EPI....... 
#STCOND        1  28  30  57  28   0   1 112 STCOND          Stool condition... .................... 
#L20B          1  29  30  57  29   0   1 112 L20B            L20B isolated.......................... 
#P1            1  30  30  57  30   0   1 112 P1              P1 (lab results)....................... 
#P2            1  31  30  57  31   0   1 112 P2              P2 (lab results)....................... 
#P3            1  32  30  57  32   0   1 112 P3              P3 (lab results)....................... 
#ENTERO        1  33  30  57  33   0   1 112 ENTERO          Entero (lab results)................... 
#CLASS         1  34  30  57  34   0   1 112 CLASS           Classification......................... 
#FDIAG         1  35  30  57  35   0   1 112 FDIAG           Final diagnosis... .................... 
_OTHDIAG       1  36  30  57  36   3   6 112 OTHDIAG         Diagnosis (if FDIAG=Other)............. 
_SDIAG         1  37  30  57  37   1  40 112 SDIAG           Specify diagnosis (if OTHDIAG=Other)... 
#CONTACT       1  38  30  57  38   0   1 112 CONTACT         Number of contacts..................... 
#ELIGCONT      1  39  30  57  39   0   1 112 ELIGCONT        AFP case eligible for contacts......... 
#INADAFP       1  40  30  57  40   0   1 112 INADAFP         Reason for contact - inadequate........ 
#HOTAFP        1  41  30  57  41   0   1 112 HOTAFP          Reason for contact - hot AFP........... 
#HARDAREA      1  42  30  57  42   0   1 112 HARDAREA        Reason for contact - area.............. 
#OTHREAS       1  43  30  57  43   0   1 112 OTHREAS         Reason for contact - other............. 
_SOTHREAS      1  44  30  57  44   1  30 112 SOTHREAS        Other reason, specify.................. 
#WILDCONT      1  45  30  57  45   0   1 112 WILDCONT        Wild poliovirus from contacts.......... 
#VDPVCONT      1  46  30  57  46   0   1 112 VDPVCONT        VDPV isolated from contacts............ 
AFGAFG/06/06/725       2KABUL               KABUL                          242!
18/09/2006                    22/09/200623/09/20060  9918/09/2006             !
 26/09/2006                    12444134                                       !
                                             !
AFGAFG/05/11/370       2MUSAYI              KABUL                          602!
22/11/2014                    25/11/201427/11/20140  9927/10/2014             !
 29/11/201411/12/2014          12444234                                       !
                                             !
AFGAFG/05/07/101       2BAMYAN              BAMYAN                        9001!
                                                  0  9905/08/2007             !
                                                                              !
                                             !
AFGAFG/05/17/005       2SAYDABAD            WARDAK                         541!
02/01/201704/01/201704/01/201704/01/201705/01/20175131818/10/2016111          !
 10/01/201721/01/2017          12444235G04                                    !
                                             !
AFGAFG/05/17/007       1WARAS               BAMYAN                          61!
01/01/201704/01/201704/01/201704/01/201705/01/20174 3 718/10/2016111          !
 10/01/201721/01/2017          12444235B34                                    !
                                             !
AFGAFG/05/17/002       2KABUL               KABUL                          181!
01/01/201701/01/201701/01/201702/01/201707/01/20175 61117/10/2016111          !
 07/01/201719/01/2017          12444235G81                                    !
                                             !
AFGAFG/05/17/003       2SHEKHALI            PARWAN                         441!
01/01/201703/01/201703/01/201703/01/201705/01/20174141816/10/2016112          !
 07/01/201719/01/2017          12444235E87.6                                  !
                                             !
AFGAFG/05/17/008       2NERKH               WARDAK                         482!
03/01/201704/01/201704/01/201705/01/201706/01/20175121718/10/2016111          !
 10/01/201721/01/2017          12444235B34                                    !
                                             !
AFGAFG/05/17/001       2KHENJ (HES-E- AWAL) PANJSHER                       142!
01/01/201702/01/201702/01/201702/01/201703/01/20175 4 917/10/2016111          !
 05/01/201716/01/2017          12444235B34                                    !
                                             !
AFGAFG/05/17/004       2KABUL               KABUL                          362!
01/01/201702/01/201702/01/201702/01/201704/01/20175 71214/12/2016111          !
 06/01/201717/01/2017          12444235B34                                    !
                                             !

The first line in the file lists the number of variables 44, and the 8th column is the length for each variable (before 112 column) I managed to read the variables and put them in an array but I face a problem how to read the data for each variable.

I will show how I accomplished that:

<?php

$file_name = 'CTRYAFP10.rec';

if (file_exists($file_name)) {
    $file = fopen($file_name, "r");

    $first_row = fgets($file);
    //I used fgets() to read the file row by row

    $first_row_array = explode(" ", trim($first_row));

    $numberOfVariables = intval($first_row_array[0]);

    $total_length_for_all_varibles = 0;
    $number_of_data_rows = 0;

    $j = 0;
    $heads = array();
    $last_end = null;

    for ($i = 0; $i < $numberOfVariables; $i++) {
        $result = fgets($file);
        $variable_name = strtolower(trim(substr($result, 1, 11)));
        $current_item_length = intval(trim(substr($result, 36, 4)));
        $total_length_for_all_varibles += $current_item_length;
        $last_end = $current_item_length + intval($last_end);

        if ($current_item_length > 0) {
            if ($i === 0) {
                //first loop
                $heads[$i]['start'] = 0;//variable starts at position 0 of the row
            } else {
                $prev_start = $heads[$i - 1]['start'];
                $prev_item_length = $heads[$i - 1]['field_length'];

                $x = $prev_start + $prev_item_length;
                $data_row_limit = 79 - $x;//the limits of each data row is 79
                if($data_row_limit > $current_item_length){
                    $heads[$i]['start'] = $x;
                } else {
                    $heads[$i]['start'] = 0;
                }
            }

            $heads[$i]['field_length'] = $current_item_length;
            $heads[$i]['variable_name'] = strtolower(trim($variable_name));

        }//end if length > 0
    }//end for loop for getting the variables

    $number_of_data_rows = ceil($total_length_for_all_varibles/79);// in this case 4
    $total_length_for_all_varibles +=  $number_of_data_rows;//in this case its 283

    //while (!feof($file)){
        $data = '';
        $insert_data = array();
        for($i = 1; $i <= $number_of_data_rows; $i++){
            $data .= fgets($file);
        }
/*
The output for the previous loop which represents (the data/empty value) for the 44 variables
AFGAFG/06/06/725       2KABUL               KABUL                          242!
18/09/2006                    22/09/200623/09/20060  9918/09/2006             !
 26/09/2006                    12444134                                       !
                                             !
*/
        foreach ($heads as $key => $val){
            $item_val = trim(substr($data, $val['start'], $val['field_length']));
            $insert_data[$val['variable_name']] = $item_val;
        }

    var_dump($insert_data);exit();
    //}

}

The output for the above code is, it reads till ["sex"] variable correct values but it doesn't continue to the second row after 79 charachter popsition:

array(44) {                                           
  ["country"]=>                                       
  string(3) "AFG"                                     
  ["idcode"]=>                                        
  string(13) "AFG/06/06/725"                          
  ["hotcase"]=>                                       
  string(1) "2"                                       
  ["district"]=>                                      
  string(5) "KABUL"                                   
  ["province"]=>                                      
  string(5) "KABUL"                                   
  ["dob"]=>                                           
  string(0) ""                                        
  ["age"]=>                                           
  string(2) "24"                                      
  ["sex"]=>                                           
  string(1) "2"                                       
  ["donset"]=>                                        
  string(10) "AFGAFG/06/"                             
  ["dnot"]=>                                          
  string(6) "06/725"                                  
  ["doi"]=>                                           
  string(6) "2KABUL"                                  
  ["dstcoll1"]=>                                      
  string(0) ""                                        
  ["dstcoll2"]=>                                      
  string(5) "KABUL"                                   
  ["dosesr"]=>                                        
  string(0) ""                                        
  ["dosesn"]=>                                        
  string(0) ""                                        
  ["doses"]=>                                         
  string(0) ""                                        
  ["dlopv"]=>                                         
  string(0) ""                                        
  ["fever"]=>                                         
  string(0) ""                                        
  ["progress"]=>                                      
  string(0) ""                                        
  ["asym"]=>                                          
  string(0) ""                                        
  ["dfup"]=>                                          
  string(3) "242"                                     
  ["fup"]=>                                           
  string(1) "A"                                       
  ["dstlab"]=>                                        
  string(10) "FGAFG/06/0"                             
  ["dtres"]=>                                         
  string(5) "6/725"                                   
  ["dires"]=>                                         
  string(6) "2KABUL"                                  
  ["stcond"]=>                                        
  string(0) ""                                        
  ["l20b"]=>                                          
  string(0) ""                                        
  ["p1"]=>                                            
  string(0) ""                                        
  ["p2"]=>                                            
  string(0) ""                                        
  ["p3"]=>                                            
  string(0) ""                                        
  ["entero"]=>                                        
  string(0) ""                                        
  ["class"]=>                                         
  string(0) ""                                        
  ["fdiag"]=>                                         
  string(0) ""                                        
  ["othdiag"]=>                                       
  string(1) "K"                                       
  ["sdiag"]=>                                         
  string(29) "AFGAFG/06/06/725       2KABUL"          
  ["contact"]=>                                       
  string(0) ""                                        
  ["eligcont"]=>                                      
  string(0) ""                                        
  ["inadafp"]=>                                       
  string(0) ""                                        
  ["hotafp"]=>                                        
  string(0) ""                                        
  ["hardarea"]=>                                      
  string(1) "K"                                       
  ["othreas"]=>                                       
  string(1) "A"                                       
  ["sothreas"]=>                                      
  string(30) "BUL                          2"         
  ["wildcont"]=>                                      
  string(1) "4"                                       
  ["vdpvcont"]=>                                      
  string(1) "2"                                       
}                                                     
2
  • Can you split first part and second part into 2 files? Looks like first part is whitespace delimited so i'd use fgetcsv for that. Commented Feb 6, 2021 at 15:06
  • No the files will be uploaded by different users and the used software dump the variables and the data in the same file .rec and this is just a sample the actual files consists of thousands of rows. Commented Feb 6, 2021 at 15:11

1 Answer 1

1

I think the problem is that you are only looking at the start and length of each field relative to a single line of data in...

$data_row_limit = 79 - $x;

Each time you get past this, you reset the start to position 0 again, which is why it always extract the data from the start of the string.

If instead you just keep on adding the lengths...

for ($i = 0; $i < $numberOfVariables; $i++) {
    $result = fgets($file);
    $variable_name = strtolower(trim(substr($result, 1, 11)));
    $current_item_length = intval(trim(substr($result, 36, 4)));
    $total_length_for_all_varibles += $current_item_length;
    $last_end = $current_item_length + intval($last_end);

    if ($current_item_length > 0) {
        if ($i === 0) {
            //first loop
            $heads[$i]['start'] = 0;//variable starts at position 0 of the row
        } else {
            $prev_start = $heads[$i - 1]['start'];
            $prev_item_length = $heads[$i - 1]['field_length'];

            // Just store position here
            $heads[$i]['start'] = $prev_start + $prev_item_length;
        }

        $heads[$i]['field_length'] = $current_item_length;
        $heads[$i]['variable_name'] = strtolower(trim($variable_name));

    }//end if length > 0
}//end for loop for getting the variables

I've also modified the string concatenation to remove the ! and some other bits it may have...

for($i = 1; $i <= $number_of_data_rows; $i++){
    $data.= rtrim(fgets($file), "!\n\r ");
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.