UPDATE (2021-11-13 22:12 GMT+8): regarding the Snap packages, @karel suggested that this question is a duplicate of Why don't the Ubuntu repositories have the latest versions of software? I disagree, because (1) Snaps, being self-confined and bundled with all its dependencies, are different from deb packages and I would expect the former to follow upstream more closely, and (2) even if not, I would expect them to be in stable by now.
I see this has already been asked in Hadoop & Spark - why no Ubuntu packages? , but (1) that was back in 2015 and the computing landscape has changed a lot since then, and (2) the only response to that other question does not really answer it, so I thought it would be appropriate to ask again.
So now in 2021 cloud computing and big data has only become more ubiquitous compared to 2015. Considering that one of the major use cases of Linux is in cloud computing / big data, why is the de-facto way of setting up Hadoop and Spark (key frameworks related to big data processing) still downloading and unpacking archives from upstream, instead of simply fetching the appropriate binary packages from the official Ubuntu repositories by running an appropriate apt install
command? Unless I'm missing something, I imagine that having such commonly-used frameworks prepackaged for Ubuntu would bring a number of tangible benefits to a vast user base, such as (but not limited to):
- Improved integration with the host system
- Less manual setup and configuration required
P.S. I've also checked the Snap store considering Canonical's push towards snaps in recent years, and while they appear to be packaged (Hadoop, Spark), the last efforts were back in 2017 and they are only available in the unstable beta / edge channels.
apt-cache search
) – Donald Sebastian Leung Nov 13 '21 at 14:07