Trololo

By: KamoKatt

24682   1353   7360733

Uploaded on 12/20/2009

Comments (6):

By anonymous    2017-09-20

Instead of writing a complicated regex that probably work not in all cases, you better use tools to analyze the url, like urllib:

from urllib.parse import urlparse, parse_qs

url = 'http://youtube.com/watch?v=iwGFalTRHDA'

def get_id(url):
    u_pars = urlparse(url)
    quer_v = parse_qs(u_pars.query).get('v')
    if quer_v:
        return quer_v[0]
    pth = u_pars.path.split('/')
    if pth:
        return pth[-1]

This function will return None if both attempts fail.

I tested it with the sample urls:

>>> get_id('http://youtube.com/watch?v=iwGFalTRHDA')
'iwGFalTRHDA'
>>> get_id('http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related')
'iwGFalTRHDA'
>>> get_id('https://youtube.com/iwGFalTRHDA')
'iwGFalTRHDA'
>>> get_id('http://youtu.be/n17B_uFF4cA')
'n17B_uFF4cA'
>>> get_id('youtube.com/iwGFalTRHDA')
'iwGFalTRHDA'
>>> get_id('youtube.com/n17B_uFF4cA')
'n17B_uFF4cA'
>>> get_id('http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4')
'r5nB9u4jjy4'
>>> get_id('http://www.youtube.com/watch?v=t-ZRX8984sc')
't-ZRX8984sc'
>>> get_id('http://youtu.be/t-ZRX8984sc')
't-ZRX8984sc'

Original Thread

By anonymous    2017-09-20

I really advise on @LukasGraf's comment, however if you really must use regex you can check the following:

(?:(?:https?\:\/\/)?(?:www\.)?(?:youtube|youtu)(?:(?:\.com|\.be)\/)(?:embed\/)?(?:watch\?)?(?:feature=player_embedded)?&?(?:v=)?([0-z]{11}|[0-z]{4}(\-|\_)[0-z]{4}|.(\-|\_)[0-z]{9}))

Here is a working example in regex101: https://regex101.com/r/5eRqn2/1

And here the python example:

In [38]: r = re.compile('(?:(?:https?\:\/\/)?(?:www\.)?(?:youtube|youtu)(?:(?:\.com|\.be)\/)(?:embed\/)?(?:watch\?)?(?:feature=player_embedded)?&?(?:v=)?([0-z]{11}|[0-z]{4}(?:\-|\_)[0-z]{4}|.(?:\-|\_)[0-z]{9}))')
In [39]: r.match('http://youtube.com/watch?v=iwGFalTRHDA').groups()
Out[39]: ('iwGFalTRHDA',)
In [40]: r.match('http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related').groups()
Out[40]: ('iwGFalTRHDA',)
In [41]: r.match('https://youtube.com/iwGFalTRHDA').groups()
Out[41]: ('iwGFalTRHDA',)

In order to not catch specific group in regex you should this: (?:...)

Original Thread

By anonymous    2017-09-20

Here's the approach I'd use, no regex needed at all.

(This is pretty much equivalent to @Willem Van Onsem's solution, plus an easy to run / update unit test).

from urlparse import parse_qs
from urlparse import urlparse
import re
import unittest


TEST_URLS = [
    ('iwGFalTRHDA', 'http://youtube.com/watch?v=iwGFalTRHDA'),
    ('iwGFalTRHDA', 'http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related'),
    ('iwGFalTRHDA', 'https://youtube.com/iwGFalTRHDA'),
    ('n17B_uFF4cA', 'http://youtu.be/n17B_uFF4cA'),
    ('iwGFalTRHDA', 'youtube.com/iwGFalTRHDA'),
    ('n17B_uFF4cA', 'youtube.com/n17B_uFF4cA'),
    ('r5nB9u4jjy4', 'http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4'),
    ('t-ZRX8984sc', 'http://www.youtube.com/watch?v=t-ZRX8984sc'),
    ('t-ZRX8984sc', 'http://youtu.be/t-ZRX8984sc'),
    (None, 'http://www.stackoverflow.com')
]

YOUTUBE_DOMAINS = [
    'youtu.be',
    'youtube.com',
]


def extract_id(url_string):
    # Make sure all URLs start with a valid scheme
    if not url_string.lower().startswith('http'):
        url_string = 'http://%s' % url_string

    url = urlparse(url_string)

    # Check host against whitelist of domains
    if url.hostname.replace('www.', '') not in YOUTUBE_DOMAINS:
        return None

    # Video ID is usually to be found in 'v' query string
    qs = parse_qs(url.query)
    if 'v' in qs:
        return qs['v'][0]

    # Otherwise fall back to path component
    return url.path.lstrip('/')


class TestExtractID(unittest.TestCase):

    def test_extract_id(self):
        for expected_id, url in TEST_URLS:
            result = extract_id(url)
            self.assertEqual(
                expected_id, result, 'Failed to extract ID from '
                'URL %r (got %r, expected %r)' % (url, result, expected_id))


if __name__ == '__main__':
    unittest.main()

Original Thread

Recommended Books

    Submit Your Video

    If you have some great dev videos to share, please fill out this form.