1

I have the following problem: I wrote a script, that extracts information about the installed packages of my system (Ubuntu 16.04 LTS). I am particularly interested in the source of the package. This means, that the data of APT-Sources from apt show <packagename> is crucial for me.

As of now, my script has to call apt show for every single installed package which creates an almost unacceptable workload in comparison to how small of a task this should be [the CPU load reaches almost 100%].

I was hoping, that there was some file on the system, that has all the information stored, that is output by apt show. Reading and parsing that file should be faster than calling apt show thousands of times. Is there such a file?


Please note, that I already tried to use dpkgand apt-cache, but both do not provide the APT-Sources information.


edit: Maybe some elaboration might be useful. My Python script calls apt list --installed to get the list of installed packages and parses this output into a list, containing only the package names as strings.

Then it calls apt show for every element in this list.

I would have liked, to only have a single file, read once, that contains information about the installed packages. I then would have my script parse this file, add the information to the list element and be done in one iteration. My hope was, that reading a large file once and parsing it, is faster than calling a CLI command many hundreds of times.

As such, I assume, that greping over multiple files multiple times would not really decrease the workload.

muru
  • 197,895
  • 55
  • 485
  • 740
KyuMirthu
  • 113
  • Uh, I don't see anything special in the output of apt show git | grep Sources that's not in the output of apt-cache policy git. – muru Jul 20 '17 at 07:16
  • @muru This looks more like it. But looking into /var/cache/apt/ I can't find anything I could directly parse in my script. For example running grep -r "Sources" * inside /var/cache/apt/ gives me nothing. Is there a non-binary file, that would provide the info? – KyuMirthu Jul 20 '17 at 07:45
  • The output in APT-Sources (and in apt-cache policy) is essentially the name of a file /var/lib/apt/lists: Compare output of grep 'Package: git' /var/lib/apt/lists/*_Packages -l | awk -F'[/_]' '{printf "http://%s/%s %s/%s %s %s\n", $6, $7, $9, $10, $11, $12}' – muru Jul 20 '17 at 07:54
  • If you're using Python and you run apt list, you're already doing it wrong. APT has a Python API: https://apt.alioth.debian.org/python-apt-doc/ What do you need it for, anyway? – muru Jul 20 '17 at 07:59
  • I basically implemented a system to watch over the installed packages in our landscape, collecting information about them and checking, if some of those are installed with a wrong version. Since some of these packages come from our own intern repositories, it is necessary to know, where the package came from. Originally the landscape ran a SLES distro, but we were going to switch to Ubuntu in the near future (with SLES, rpm and zypperwhere enough to get all the information we could wish for). – KyuMirthu Jul 20 '17 at 08:02
  • But with the information that APT-Sources is basically the name of these files, you have given me my answer I was looking for. :) Sorry for being so slow to understand how this worked. Now it's clear. Also thanks for the Python API link. :) – KyuMirthu Jul 20 '17 at 08:05

2 Answers2

1

Here:

ls /var/lib/apt/lists

depend on the repository and section, for example to gather data about wget in main section for amd64 architecture you can use:

grep -A20 "Package: wget" /var/lib/apt/lists/*_ubuntu_dists_xenial-updates_main_binary-amd64_Packages

or as Muru suggested use awk for more flexible result:

awk -v RS='\n\n' -v pkg=wget '$2 == pkg' /var/lib/apt/lists/*_ubuntu_dists_xenial_main_binary-amd64*
  • Using RS (Record separator) we can easily get all data related to our package.

Note that apt also uses some binary caches made of above files to increase the speed of its queries, these caches are located here:

ls /var/cache/apt/
Ravexina
  • 55,668
  • 25
  • 164
  • 183
  • 1
    I'd suggest something like awk -v RS='\n\n' -v pkg=wget '$2 == pkg' /var/lib/apt/lists/*_ubuntu_dists_xenial-updates_main_binary-amd64_Packages - easier to customize the package name if you replace wget with a variable. – muru Jul 20 '17 at 07:21
  • Thank your for your response. Maybe you can clarify the following: Wouldn't I need to have some prior knowledge about where the package came from to use this grep command? Also the output matches that of apt-cache show, which does not suffice for me. I was hoping for a single file that I could read in and parse. The grep command with the glob looks like a potentially unnecessary number of file-reading to me, if I could just read one single file once instead. – KyuMirthu Jul 20 '17 at 07:26
  • @AlexPoth grep everything in /var/lib/apt/lists – muru Jul 20 '17 at 07:26
  • @muru greping everything in /var/lib/apt/lists/ yields a long output. Parsing that one, would be a lot a work as well and also that output is not restricted to the one, where the package was actually installed from, but to every source this package might have. Also the overhead of parsing everything for every package persists. – KyuMirthu Jul 20 '17 at 07:48
  • @AlexPoth look into something like this: *_ubuntu_dists_xenial-updates_*_binary-amd64_Packages it will decrease your search into all sections of dist-updats for amd64, otherwise read a little bit about repositories in Ubuntu. – Ravexina Jul 20 '17 at 07:52
  • @AlexPoth also if you're going to use grep the -h option might be helpful to you (search without including the file names in output) – Ravexina Jul 20 '17 at 07:56
  • Thank you for your help @Ravexina I think I found my answer, combined with the comments on my question. :) – KyuMirthu Jul 20 '17 at 08:08
0

I basically implemented a system to watch over the installed packages in our landscape, collecting information about them and checking, if some of those are installed with a wrong version. Since some of these packages come from our own intern repositories, it is necessary to know, where the package came from.

Using the Python APT API:

#! /usr/bin/python3
import apt
cache = apt.cache.Cache()
for pkg in cache:
    if pkg.is_installed:
         name = pkg.name
         version = pkg.installed.version
         origins = [o.site for o in pkg.installed.origins if o.site]
         print(name, version, origins)

Example output:

$ ./foo.py| head
a11y-profile-manager-indicator 0.1.10-0ubuntu3 ['jp.archive.ubuntu.com']
account-plugin-facebook 0.12+16.04.20160126-0ubuntu1 ['jp.archive.ubuntu.com', 'jp.archive.ubuntu.com']
account-plugin-flickr 0.12+16.04.20160126-0ubuntu1 ['jp.archive.ubuntu.com', 'jp.archive.ubuntu.com']
account-plugin-google 0.12+16.04.20160126-0ubuntu1 ['jp.archive.ubuntu.com', 'jp.archive.ubuntu.com']
accountsservice 0.6.40-2ubuntu11.3 ['jp.archive.ubuntu.com']
acl 2.2.52-3 ['jp.archive.ubuntu.com']
acpi-support 0.142 ['jp.archive.ubuntu.com']
acpid 1:2.0.26-1ubuntu2 ['jp.archive.ubuntu.com']
activity-log-manager 0.9.7-0ubuntu23.16.04.1 ['jp.archive.ubuntu.com']
adduser 3.113+nmu3ubuntu4 ['jp.archive.ubuntu.com', 'jp.archive.ubuntu.com']
muru
  • 197,895
  • 55
  • 485
  • 740