The mbox format is rather simple and the standard is rather flexible and most email apps don’t really care about the integrity of that file itself beyond the presence of the From ...
line (that follows the mbox format standards) before each message and the blank line after it, but rather care more about the integrity if the contained messages within it … They get the real info from the message itself AFAIK ... Please see my other answer for in depth explanation.
Therefore running this awk
script on your current mbox file as long as it has the bare minimum standard i.e. a line that starts with From
(not the one from the message headers which is followed by a colon :
) should generate a new mbox format file that can be parsed by email clients:
awk '
! /^From [^:].*/ {
p = $0
}
/^From [^:].*/ {
if (p !~ "^$") {
$0 = "\nFrom - Tue Nov 21 17:30:08 2023"
} else {
$0 = "From - Tue Nov 21 17:30:08 2023"
}
}
1 {
print
}
' original_file > fixed_file
... where original_file
is your current mbox file and fixed_file
is the new file that will be generated with the fixed mbox format and the messages from the original_file
.
Notice the above script will fix common errors in existing From …
lines before messages like inconsistent spacing or the inclusion of non standard characters like commas and will fix the absence of a blank line in between messages … Those type of errors usually result from using an mbox export tool that doesn’t conform to mbox format standards and will cause confusion for some email clients which oftentimes results in the mbox file being none-parseable by those clients or parsing the whole file with all individual messages in it as one big continuous message … If, however, your mbox file doesn’t satisfy the bare minimum for this script i.e. the existence of a From …
line in between individual messages, then you’ll probably need to write your own script to parse the message headers themselves or add that line first manually.
From
line before each message and the blank line after it … They get the real info from the message itself AFAIK. – Raffa Nov 28 '23 at 05:50