Often in NLP project the data points contain both text and float embeddings, and it's very tricky to deal with. CSVs take up a ton of memory and are slow to load. But most the other data formats seem to be meant for either pure text or pure numerical data.
There are those that can handle data with the dual data types, but those are generally not flexible for wrangling. For example, for pickle you have to load the entire thing into memory if you want to wrangle anything. You can just append directly to the disk like you can with hdf5, which can be very helpful for huge datasets which can not be all loaded into memory?
Also, any alternatives to Pandas for wrangling Huge datasets? Sometimes you can't load all the data into Pandas without causing a memory crash.