0

We have a .dat file that is completely unreadable in a text-editor. This .dat file holds information about materials and I used on a machine which can display the data.

Now we are going to use the files for other purposes end need to convert them to a readable format. When I open the .dat file in NOtepad++ this is what I see:

2ñg½x¹¾T?B½ÛÁ@½^fÓ¼":°êȽ¸ô»YY‚½g *½$S)¼¤“è¼F„J¼c1¼$ ¼*‡Ç»½Ú7¼F]S¼Ê(Ï<(‚¤½Y´½å½@ø;N‡;o¸¼¨*S:ΣC¼ÎÀR½žO<š_å¼T÷½p4>½8«»«=ýÆZ<¿[=²”<æt¼pc»q³<×R<ï4¼}Ž‚8pýw<~ï»z†:Qš¼^Kp;XI=<Ѷ ¼ j½…é-=*Ý=;-X7½ßÓ:<ÐZ<Ás!=²LÀ;æã=võu<„4½§V9„燺ý$D<"Š|»å€,<E{=+»¥;2wN¼¸rF=h®<ç[=²=é\=Îý<…À¦¼Î,è¼u…<#_.¼¾Ã¨9æ3½Å°“<ª×½°ÇD¼JÝþ»ph{=Ÿv8;Ne¼’Q; ´{»(ì¿<6Þï»éõ¼*p½©m¼ÝM–<ròä¼½™™¼Õö=j|½±‰Í;2¥C¼¯ 輓?½>¼:„3» ­ù¼¦k ¼wÞ¹¼Öm‚»=T¼êy¦¼k[…»ÎÉO¼Žc¼$ï½ÖN;H¼4Ø:8¸ž¼dLý¼ø9ø»cI(;4뼈Q¼ž7½,h?¼À ɼy½Å’œ¼¶Åº¼å"±¼bžu¼ Z;½¨½øáY¼ZÖ»2 ½ð^š<Þ„§<»ƒ<@±c<f<ŸPÝ;‹œlºÐöï»ö²ñ;ÜŠb=¦';f´<ò=¬3B<\mÛ¼¹©»åB<»Xô;€ºp»¸ ±¼‰Øâ¼7Ug¼€÷ø¼lËû»j}»²‘ô;wu½®ö²¼Ÿ„¼ŠÉ¼ÖV8 м‹÷¯¼ål¼é°ª¼‹o4½ðî$<4Q:.A< <ެë<^·G<n œ<¶l<: è;’MÜ9êÁa<’¢T;~&¼gY®»"P¼¤µº;$H=½…o<6ëæ»ûÒ¼Ê,<‚p½¯À¼@êw»Ír¥¼¸wغA:«<TDI»Nºµ<€ŠMºwnܸ·6:CÕj<àÆ:Dr<7ëo9STÏ<G¼R?M<:)N;.3 <†L<ºZ=I,Y<ñF;iÙ.» pºå0<;:=Tʪ;—ÄË;?'й0Ž:J’J<jR¯»´/½Ô”ؼ•¥˜¼hμd™<9¼iˆ‘<(Šd<ɇÖ#·³È‚»@O><Úo<Ó¸ <ëî;ÒQ<õöî<#Nm¼öw4¼’O¼v <:3<

We know the data in the .dat file has the following format:

MaterialBase  ThicknessBase  ThicknessIterated  Pixel  Value
Plastic        0              0                   1     -5.662651e-02
Plastic        0              0                   2     -1.501216e-01   
Plastic        0              0                   3     -4.742368e-02

By searching lots of code on, also here of course, I came up with the following code:

import time
import binascii
import csv
import serial
import numpy as np

with open('validationData.dat.201805271617', 'rb') as input_file:
    lines = input_file.readlines()
    newLines = []
    for line in lines:
        newLine = line.strip('|').split()
        newLines.append(newLine)

with open('file.csv', 'w') as output_file:
    file_writer = csv.writer(output_file)
    file_writer.writerows(newLines)

The error I get now is:

File "c:\Users\joost.bazelmans\Documents\python\dat2csv.py", line 15, in 
<module>
newLine = line.strip('|').split()
TypeError: a bytes-like object is required, not 'str'

It looks lik the script is reading the file, but then it cannot split it by the | character. But I am lost now. Any ideas on how to continue?

Edit 2018-07-23 13:00 Based on guidot's reply we tried to work with.struct. We are now able to get a list of floating point values out of the .dat file. This is also what the R-script does. But after that the R-script does translate the floats to readable data. import struct datavector = []

with open('validationData.dat.201805271617', "rb") as f:
    n = 1000000
    count = 0
    byte = f.read(4) # same as size = 4 in R
    while count < n and byte != b"":
        datavector.append(struct.unpack('f',byte))
        count += 1
        byte = f.read(4)

print(datavector)

the result we get is something like this: [(-0.05662650614976883,), (-0.1501215696334839,), (-0.047423675656318665,), (-0.04705987498164177,), (-0.025805648416280746,), (0.0006194132147356868,), (-0.09810388088226318,), (-0.007468236610293388,), (-0.06364697962999344,), (-0.04153480753302574,), (-0.010334763675928116,), (-0.028390713036060333,), (-0.01236063800752163,), (-0.010809036903083324,), (-0.0195484422147274,), (-0.006089110858738422,), (-0.011221584863960743,), (-0.012900656089186668,), (0.02528800442814827,), (-0.0803263783454895,), (-0.03630480542778969,), (-0.03244496509432793,), (0.007571130990982056,), (0.004120028577744961,), (-0.022513896226882935,), (0.0008055367507040501,), (-0.011940909549593925,), (-0.05145340412855148,), (0.008258728310465813,), (-0.02799968793988228,), (-0.035880401730537415,), (-0.04643672704696655,), (-0.005221989005804062,), (0.03542486950755119,), (0.013353106565773487,), (0.035976167768239975,), (0.008336232975125313,), (-0.01492307148873806,), (-0.003470425494015217,), (0.02190450392663479,), (0.012822589837014675,), (-0.008801682852208614,), (6.225423567229882e-05,), (0.015136107802391052,), (-0.007297097705304623,), (0.0010259768459945917,), (-0.018891485407948494,), (0.0036666016094386578,), (0.01155313104391098,), (-0.009809211827814579,), (-0.03696637228131294,), (0.04245902970433235,), (0.002897093538194895,), (-0.04476182535290718,), (0.011403053067624569,), (0.01330728828907013,), (0.03941703215241432,), (0.005868517793715,), (0.031955622136592865,), (0.015012135729193687,), (-0.0439620167016983,), (0.00014146660396363586,), (-0.0010368679650127888,), (0.011971709318459034,), (-0.003853448200970888,), (0.010528777725994587,), (0.06129004433751106,), (0.00505771255120635,), (-0.012601660564541817,), (0.01481446623802185,), (0.019019771367311478,), (0.004633020609617233,), (-0.021741455420851707,), (-0.033449672162532806,), (-0.021316081285476685,), (0.00593474181368947,), (0.0030296281911432743,), (0.023055575788021088,), (0.0256675872951746,), (0.03663543984293938,), (0.044298700988292694,), (0.01264342200011015,), (0.032493121922016144,), (-0.06546197831630707,), (0.031123168766498566,), (0.005013703368604183,), (-0.006611336953938007,), (-0.041526272892951965,), (0.0007577596697956324,), (0.030475322157144547,), (0.034476157277822495,), (-0.015037396922707558,), (0.07587681710720062,)]

Now the question is how to convert these flaoting point numbers to readable content

11
  • Why do you open the file in binary mode? It seems to be a simple text file containing CSV data. Commented Jul 20, 2018 at 11:17
  • the example of the data I give is how the data is visible for the machine. The data in the file is actually unreadable, Commented Jul 20, 2018 at 11:22
  • If the text is "unreadable" (how?) in a text editor, how does reading it as if it's plain text with readlines and then treating it as if it's plain text with strip help? Commented Jul 20, 2018 at 11:23
  • 1
    For a binary file I would expect intensive usage of structmodule instead of split() and strip()calls. Commented Jul 20, 2018 at 11:25
  • Please provide the file and the corresponding few lines of what you know it contains. Commented Jul 20, 2018 at 11:45

1 Answer 1

1

Since you opened your file in binary format, mode='rb', I think you probably just need to specify a bytes-like character to strip:

newLine = line.strip(b'|').split()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.