2

I have an api-request that gives the output in the json form (form? layout? body? how do you say that?). See here:

    {
        "title": "Another Life (2019)",
        "alternateTitles": [
            {
                "title": "Another Life",
                "seasonNumber": -1
            }
        ],
        "sortTitle": "another life 2019",
        "seasonCount": 2,
        "totalEpisodeCount": 20,
        "episodeCount": 10,
        "episodeFileCount": 10,
        "sizeOnDisk": 2979171318,
        "status": "continuing",
        "overview": "Astronaut Niko Breckenridge and her young crew face unimaginable danger as they go on a high-risk mission to explore the genesis of an alien artifact.",
        "previousAiring": "2019-07-25T07:00:00Z",
        "network": "Netflix",
        "airTime": "03:00",
        "seasons": [
            {
                "seasonNumber": 1,
                "monitored": true,
                "statistics": {
                    "previousAiring": "2019-07-25T07:00:00Z",
                    "episodeFileCount": 10,
                    "episodeCount": 10,
                    "totalEpisodeCount": 10,
                    "sizeOnDisk": 2979171318,
                    "percentOfEpisodes": 100.0
                }
            },
            {
                "seasonNumber": 2,
                "monitored": true,
                "statistics": {
                    "episodeFileCount": 0,
                    "episodeCount": 0,
                    "totalEpisodeCount": 10,
                    "sizeOnDisk": 0,
                    "percentOfEpisodes": 0.0
                }
            }
        ],
        "tags": [],
        "added": "2020-12-02T15:01:43.942456Z",
        "ratings": {
            "votes": 26,
            "value": 6.0
        },
        "qualityProfileId": 3,
        "id": 24
    }

I have about 20 of these outputs in a long list. This is one of them.

The problem

In the long list, I'll be grep-ing "\"title\": \"Another Life (2019)\"", where Another Life (2019) can be any of the 20 series. In need to get the id (at the bottom of the output).

But doing grep -Eo "\"id\": [0-9]{1,4}" wont work as I would get 20 Id's as an output.

Doing grep -Eo "\"title\": \"Another Life (2019)\".*\"id\": [0-9]{1,4}" also doesn't work.

Doing grep -A 100 "\"title\": \"Another Life (2019)\"" and then grep-ing the id also doesn't work.

I can't seem to get it to work how I want. I'm having problems in general understanding how grabbing strings in a json body works.

If I choose "Devs", I want to get the id of the series Devs. If I choose (be it setting a variable or inserting the name somewhere in the command) "Prison Break", I want to get the id of the series Prison Break.

Thanks!

Cas
  • 562

1 Answers1

1

Using --perl-regexp (PCRE) works for me:

grep -P -- '"id": \K[0-9]{1,4}' infile.txt

The \K notify will ignore the matched part come before itself (source). If you want only the numbers, you can add the option -o:

grep -oP -- '"id": \K[0-9]{1,4}' infile.json

If you need multiline search add the option -z:

grep -zPo -- '(?s)Another Life.*?"id": \K[0-9]{1,4}\n' infile.json

Where (?s) activate PCRE_DOTALL, which means that '.' finds any character or newline (source).

The above command will output all occurrences of the value of id after the line that contains Another Life. It seems it is not possible to catch only the first occurrence with grep, so we need to process the output with another tool, let's say head:

grep -zPo -m1 -- '(?s)Another Life.*?"id": \K[0-9]{1,4}.' infile.json | head -1
pa4080
  • 29,831
  • The last command almost works. The problem is .*\n.*. This makes it grep the last id in the complete list of series instead of the id corresponding to the requested series. The output in the question is one of 20, aka the complete output has 20 of those. Now it searches for the series name, and then for the last id it could find. Which is from a totally different series. How do you think we could fix this? – Cas Feb 11 '21 at 09:18
  • 1
    @Cas you probably need to make the .* match non-greedy i.e. .*? - nevertheless, I would strongly recommend using jq rather than grep for this task – steeldriver Feb 11 '21 at 11:51