convert a .dat file with python

Question

We have a .dat file that is completely unreadable in a text-editor. This .dat file holds information about materials and I used on a machine which can display the data.

Now we are going to use the files for other purposes end need to convert them to a readable format. When I open the .dat file in NOtepad++ this is what I see:

2ñg½x¹¾T?B½ÛÁ@½^fÓ¼":°êÈ½¸ô»YY‚½g *½$S)¼¤“è¼F„J¼c1¼$ ¼*‡Ç»½Ú7¼F]S¼Ê(Ï<(‚¤½Y´½å½@ø;N‡;o¸¼¨*S:Î£C¼ÎÀR½žO<š_å¼T÷½p4>½8«»«=ýÆZ<¿[=²”<æt¼pc»q³<×R<ï4¼}Ž‚8pýw<~ï»z†:QÂš¼^Kp;XI=<Ñ¶ ¼ j½…é-=*Ý=;-X7½ßÓ:<ÐZ<Ás!=²LÀ;æã=võu<„4½§V9„ç‡ºý$D<"Š|»å€,<E{=+»¥;2wN¼¸rF=h®<ç[=²=é\=Îý<…À¦¼Î,è¼u…<#_.¼¾Ã¨9æ3½Å°“<ª×½°ÇD¼JÝþ»ph{=Ÿv8;Ne¼’Q; ´{»(ì¿<6Þï»éõ¼*p½©m¼ÝM–<ròä¼½™™¼Õö=j|½±‰Í;2¥C¼¯ è¼“?½>¼:„3» ù¼¦k ¼wÞ¹¼Öm‚»=T¼êy¦¼k[…»ÎÉO¼Žc¼$ï½ÖN;H¼4Ø:8¸ž¼dLý¼ø9ø»cI(;4ë¼ˆQ¼ž7½,h?¼À É¼y½Å’œ¼¶Åº¼å"±¼bžu¼ Z;½¨½øáY¼ZÖ»2 ½ð^š<Þ„§<»ƒ<@±c<f<ŸPÝ;‹œlºÐöï»ö²ñ;ÜŠb=¦';f´<ò=¬3B<\mÛ¼¹©»åB<»Xô;€ºp»¸ ±¼‰Øâ¼7Ug¼€÷ø¼lËû»j}»²‘ô;wu½®ö²¼Ÿ„¼ŠÉÂ¼ÖV8 Š¼‹÷¯¼ål¼é°ª¼‹o4½ðî$<4Q:.A< <Ž¬ë<^·G<n œ<¶l<: è;’MÜ9êÁa<’¢T;~&¼gY®»"P¼¤µº;$H=½…o<6ëæ»ûÒ¼Ê,<‚p½¯À¼@êw»Ír¥¼¸wØºA:«<TDI»Nºµ<€ŠMºwnÜ¸·6:CÕj<àÆ:Dr<7ëo9STÏ<G¼R?M<:)N;.3 <†L<ºZ=I,Y<ñF;iÙ.» pºå0<;:=TÊª;—ÄË;?'Ð¹0Ž:J’J<jR¯»´/½Ô”Ø¼•¥˜¼hÎ¼d™<9¼iˆ‘<(Šd<É‡Ö#·³È‚»@O><Úo<Ó¸ <ëî;ÒQ<õöî<#Nm¼öw4¼’O¼v <:3<

We know the data in the .dat file has the following format:

MaterialBase  ThicknessBase  ThicknessIterated  Pixel  Value
Plastic        0              0                   1     -5.662651e-02
Plastic        0              0                   2     -1.501216e-01   
Plastic        0              0                   3     -4.742368e-02

By searching lots of code on, also here of course, I came up with the following code:

import time
import binascii
import csv
import serial
import numpy as np

with open('validationData.dat.201805271617', 'rb') as input_file:
    lines = input_file.readlines()
    newLines = []
    for line in lines:
        newLine = line.strip('|').split()
        newLines.append(newLine)

with open('file.csv', 'w') as output_file:
    file_writer = csv.writer(output_file)
    file_writer.writerows(newLines)

The error I get now is:

File "c:\Users\joost.bazelmans\Documents\python\dat2csv.py", line 15, in 
<module>
newLine = line.strip('|').split()
TypeError: a bytes-like object is required, not 'str'

It looks lik the script is reading the file, but then it cannot split it by the | character. But I am lost now. Any ideas on how to continue?

Edit 2018-07-23 13:00 Based on guidot's reply we tried to work with.struct. We are now able to get a list of floating point values out of the .dat file. This is also what the R-script does. But after that the R-script does translate the floats to readable data. import struct datavector = []

with open('validationData.dat.201805271617', "rb") as f:
    n = 1000000
    count = 0
    byte = f.read(4) # same as size = 4 in R
    while count < n and byte != b"":
        datavector.append(struct.unpack('f',byte))
        count += 1
        byte = f.read(4)

print(datavector)

the result we get is something like this: [(-0.05662650614976883,), (-0.1501215696334839,), (-0.047423675656318665,), (-0.04705987498164177,), (-0.025805648416280746,), (0.0006194132147356868,), (-0.09810388088226318,), (-0.007468236610293388,), (-0.06364697962999344,), (-0.04153480753302574,), (-0.010334763675928116,), (-0.028390713036060333,), (-0.01236063800752163,), (-0.010809036903083324,), (-0.0195484422147274,), (-0.006089110858738422,), (-0.011221584863960743,), (-0.012900656089186668,), (0.02528800442814827,), (-0.0803263783454895,), (-0.03630480542778969,), (-0.03244496509432793,), (0.007571130990982056,), (0.004120028577744961,), (-0.022513896226882935,), (0.0008055367507040501,), (-0.011940909549593925,), (-0.05145340412855148,), (0.008258728310465813,), (-0.02799968793988228,), (-0.035880401730537415,), (-0.04643672704696655,), (-0.005221989005804062,), (0.03542486950755119,), (0.013353106565773487,), (0.035976167768239975,), (0.008336232975125313,), (-0.01492307148873806,), (-0.003470425494015217,), (0.02190450392663479,), (0.012822589837014675,), (-0.008801682852208614,), (6.225423567229882e-05,), (0.015136107802391052,), (-0.007297097705304623,), (0.0010259768459945917,), (-0.018891485407948494,), (0.0036666016094386578,), (0.01155313104391098,), (-0.009809211827814579,), (-0.03696637228131294,), (0.04245902970433235,), (0.002897093538194895,), (-0.04476182535290718,), (0.011403053067624569,), (0.01330728828907013,), (0.03941703215241432,), (0.005868517793715,), (0.031955622136592865,), (0.015012135729193687,), (-0.0439620167016983,), (0.00014146660396363586,), (-0.0010368679650127888,), (0.011971709318459034,), (-0.003853448200970888,), (0.010528777725994587,), (0.06129004433751106,), (0.00505771255120635,), (-0.012601660564541817,), (0.01481446623802185,), (0.019019771367311478,), (0.004633020609617233,), (-0.021741455420851707,), (-0.033449672162532806,), (-0.021316081285476685,), (0.00593474181368947,), (0.0030296281911432743,), (0.023055575788021088,), (0.0256675872951746,), (0.03663543984293938,), (0.044298700988292694,), (0.01264342200011015,), (0.032493121922016144,), (-0.06546197831630707,), (0.031123168766498566,), (0.005013703368604183,), (-0.006611336953938007,), (-0.041526272892951965,), (0.0007577596697956324,), (0.030475322157144547,), (0.034476157277822495,), (-0.015037396922707558,), (0.07587681710720062,)]

Now the question is how to convert these flaoting point numbers to readable content

Why do you open the file in binary mode? It seems to be a simple text file containing CSV data. — user9455968
– user9455968, Commented Jul 20, 2018 at 11:17
the example of the data I give is how the data is visible for the machine. The data in the file is actually unreadable, — jbazelmans
– jbazelmans, Commented Jul 20, 2018 at 11:22
If the text is "unreadable" (how?) in a text editor, how does reading it as if it's plain text with readlines and then treating it as if it's plain text with strip help? — Jongware
– Jongware, Commented Jul 20, 2018 at 11:23
For a binary file I would expect intensive usage of structmodule instead of split() and strip()calls. — guidot
– guidot, Commented Jul 20, 2018 at 11:25
Please provide the file and the corresponding few lines of what you know it contains. — Mark Setchell
– Mark Setchell, Commented Jul 20, 2018 at 11:45

xnx · Accepted Answer · 2018-07-20 11:13:41Z

1

Since you opened your file in binary format, mode='rb', I think you probably just need to specify a bytes-like character to strip:

newLine = line.strip(b'|').split()

answered Jul 20, 2018 at 11:13

xnx

25.7k11 gold badges75 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

convert a .dat file with python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related