2

Similar questions were asked 5 years ago and before, which answers are now obsolete. Also, I do not wish to use an online service.

I wish to convert thousands of eml files into one mbox file.

Would anybody here know if this can be done, and how?

Thank you.

muru
  • 197,895
  • 55
  • 485
  • 740
rob grune
  • 1,068
  • Which answers were obsolete? Neither formats have changed all that much in recent times. – muru Nov 21 '23 at 14:00
  • 1
    IIRC the format for that conversion is not that complicated and probably a simple loop like this for e in *.eml; do date +"From - %a %b %d %H:%M:%S %Y" >> file.mbox; cat "$e" >> file.mbox; echo >> file.mbox; done might do the job. – Raffa Nov 21 '23 at 14:34
  • See at: https://gist.github.com/kadin2048/c332a572a388acc22d56 – kyodake Nov 21 '23 at 16:19

1 Answers1

2

TLDR

The format for that conversion is not that complicated and an MBOX file should be automatically generated from any number of EML files with a shell loop like this (ensure no file named file.mbox exists in the same directory before running it):

for e in *.eml
  do
    date +"From - %a %b %d %H:%M:%S %Y" >> file.mbox
    cat "$e" >> file.mbox
    echo >> file.mbox
    done

... that when run from within the directory containing the *.eml files, should at the end create a file named file.mbox in the same directory.

TLR :)

All the important information i.e. message headers and message body should be contained in each EML file .eml following an EML standard defined since 1982 upon which most of the email clients based their parsing/processing of email message files and that in most cases should keep the integrity of both the headers containing among which the sender's email and timestamps as well as the message body.

The MBOX file format, on the other hand was not standardized until more than two decades later ... In its essence, it is a sort of a single container for multiple email messages (that should follow standardized EML format).

However, The current MBOX File Format "standard" (as of 2005) is simply as follows:

First, From that is F, r, o and mfollowed by a single space then an email address of some kind and then another single space followed by a time-stamp followed by an end of the line indicator and immediately on the next line (no blank line expected), the message should start and then it should end with a blank line (no space or tabs, just a blank line) ... Then that format is repeated throughout the MBOX file and email clients should parse the file until there is no more data left or an end-of-file is reached.

I remember having to deal with such an issue quiet a long while ago and if I recall correctly the From line I used in the generated MBOX file looked like this:

From - Tue Nov 21 17:30:08 2023

... which is AFAIK is mostly overlooked by email apps and is added here as a record separator, and any timestamp should work for this purpose (it might be possible that your mail client will offer repairing those lines if it needs to).

Raffa
  • 32,237
  • 2
    @ Raffa. wow, Many thanks for the tutorial and script!!! It worked perfectly. – rob grune Nov 21 '23 at 23:51
  • 1
    @robgrune You're welcome and I'm happy I could help. – Raffa Nov 22 '23 at 11:18
  • intriguingly, I tried exporting with Thunderbird and it put commas in the dates, which seemed to cause problems for Alpine e-mail. Anyone familiar with something like that coming up? – Nicholas Saunders Nov 28 '23 at 07:04
  • 1
    @NicholasSaunders can you please show an example of the From - ... line with commas in it from the exported file? – Raffa Nov 28 '23 at 07:15
  • https://gist.github.com/NICKSAUNDERS/a6adff8fe1952c6c6f76b76c4271fb66 however, it has been a long day, so let me go through all that again to confirm that the Thunderbird plugin is doing that. Probably I have a wrong setting. Alpine email just sees one big text file and mailutils won't even open it. @Raffa – Nicholas Saunders Nov 28 '23 at 07:59
  • 1
    @NicholasSaunders If that is the culprit, it can be fixed with e.g. awk '/^From .*[0-9]{2}:[0-9]{2}:[0-9]{2}$/{$0="From - Tue Nov 21 17:30:08 2023"}1' original_file > fixed_file I think. – Raffa Nov 28 '23 at 08:31
  • What I've done for right now is, after fighting 2 Factor Authentication, got Alpine to connect with gmail. I can just select a bunch of messages and export to a file. Seems to work fine. Have no idea why Thunderbird can't do this sort of thing easily. Very frustrating. Maybe PEBKAC error. Work around accomplished for the time being @Raffa but thank you. (I still don't know why Pine choked on the Thunderbird created mbox file, makes no sense.) – Nicholas Saunders Nov 28 '23 at 08:50