Why are Hadoop and Spark not in the official Ubuntu repositories?

Question

UPDATE (2021-11-13 22:12 GMT+8): regarding the Snap packages, @karel suggested that this question is a duplicate of Why don't the Ubuntu repositories have the latest versions of software? I disagree, because (1) Snaps, being self-confined and bundled with all its dependencies, are different from deb packages and I would expect the former to follow upstream more closely, and (2) even if not, I would expect them to be in stable by now.

I see this has already been asked in Hadoop & Spark - why no Ubuntu packages? , but (1) that was back in 2015 and the computing landscape has changed a lot since then, and (2) the only response to that other question does not really answer it, so I thought it would be appropriate to ask again.

So now in 2021 cloud computing and big data has only become more ubiquitous compared to 2015. Considering that one of the major use cases of Linux is in cloud computing / big data, why is the de-facto way of setting up Hadoop and Spark (key frameworks related to big data processing) still downloading and unpacking archives from upstream, instead of simply fetching the appropriate binary packages from the official Ubuntu repositories by running an appropriate apt install command? Unless I'm missing something, I imagine that having such commonly-used frameworks prepackaged for Ubuntu would bring a number of tangible benefits to a vast user base, such as (but not limited to):

Improved integration with the host system
Less manual setup and configuration required

P.S. I've also checked the Snap store considering Canonical's push towards snaps in recent years, and while they appear to be packaged (Hadoop, Spark), the last efforts were back in 2017 and they are only available in the unstable beta / edge channels.

Does this answer your question? Why don't the Ubuntu repositories have the latest versions of software? — karel, Nov 13 '21 at 14:02
No, because Hadoop and Spark do not seem to be in the official Ubuntu repositories at all (I could not find anything relevant with apt-cache search) — Donald Sebastian Leung, Nov 13 '21 at 14:07
The hadoop and spark snap packages haven't been updated since 2017 either. That's what makes this question either a duplicate question or opinion-based. — karel, Nov 13 '21 at 14:09
But then (1) I'd expect Snap packages to follow upstream more closely, and (2) even if not, it should already be in stable by now — Donald Sebastian Leung, Nov 13 '21 at 14:10
I would expect the same thing too as both snap packages are maintained by the same person, but it didn't happen. — karel, Nov 13 '21 at 14:11

user535733 · Accepted Answer · 2021-11-13T15:57:16.523

Both Hadoop and Spark were dropped from Debian years ago, mostly due to a lack of volunteer interest in maintaining those packages. Ubuntu gets most of its deb packages from Debian, so they were dropped from Ubuntu, too.

Hadoop: Debian tracker page - Debian Bug #630820
Spark: Debian tracker page - Debian Bug #946336

Any community volunteer willing to learn the process and contribute the effort can re-introduce the packages to Debian, and they will subsequently flow into future releases of Ubuntu. More volunteers = More, better, and up-to-date software.

Also, according to https://wiki.debian.org/Hadoop, the Hadoop developers didn't make deb packaging and maintaining easy for the Debian volunteers:

There are a number of reasons for this; in particular the Hadoop build process will load various dependencies via Maven instead of using distribution-supplied packages. Java projects like this are unfortunately not easy to package because of interdependencies; and unfortunately the Hadoop stack is full of odd dependencies

If this information is stale or incorrect, once again it's up to community volunteers to step up, make corrections, and implement changes. Debian and Ubuntu are driven by volunteers. More volunteers = Better documentation.

Thank you, this was the detailed explanation I was looking for. It's a shame that the Hadoop developers did not make it easy to package for distributions such as Debian (and Ubuntu). Maybe I should consider contributing sometime :-) — Donald Sebastian Leung, Nov 14 '21 at 03:02

Why are Hadoop and Spark not in the official Ubuntu repositories?

1 Answers1