Trololo

By: KamoKatt

24682   1353   7360733

Uploaded on 12/20/2009

Comments (7):

By anonymous    2017-09-20

Instead of writing a complicated regex that probably work not in all cases, you better use tools to analyze the url, like urllib:

from urllib.parse import urlparse, parse_qs

url = 'http://youtube.com/watch?v=iwGFalTRHDA'

def get_id(url):
    u_pars = urlparse(url)
    quer_v = parse_qs(u_pars.query).get('v')
    if quer_v:
        return quer_v[0]
    pth = u_pars.path.split('/')
    if pth:
        return pth[-1]

This function will return None if both attempts fail.

I tested it with the sample urls:

>>> get_id('http://youtube.com/watch?v=iwGFalTRHDA')
'iwGFalTRHDA'
>>> get_id('http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related')
'iwGFalTRHDA'
>>> get_id('https://youtube.com/iwGFalTRHDA')
'iwGFalTRHDA'
>>> get_id('http://youtu.be/n17B_uFF4cA')
'n17B_uFF4cA'
>>> get_id('youtube.com/iwGFalTRHDA')
'iwGFalTRHDA'
>>> get_id('youtube.com/n17B_uFF4cA')
'n17B_uFF4cA'
>>> get_id('http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4')
'r5nB9u4jjy4'
>>> get_id('http://www.youtube.com/watch?v=t-ZRX8984sc')
't-ZRX8984sc'
>>> get_id('http://youtu.be/t-ZRX8984sc')
't-ZRX8984sc'

Original Thread

By anonymous    2017-09-20

I really advise on @LukasGraf's comment, however if you really must use regex you can check the following:

(?:(?:https?\:\/\/)?(?:www\.)?(?:youtube|youtu)(?:(?:\.com|\.be)\/)(?:embed\/)?(?:watch\?)?(?:feature=player_embedded)?&?(?:v=)?([0-z]{11}|[0-z]{4}(\-|\_)[0-z]{4}|.(\-|\_)[0-z]{9}))

Here is a working example in regex101: https://regex101.com/r/5eRqn2/1

And here the python example:

In [38]: r = re.compile('(?:(?:https?\:\/\/)?(?:www\.)?(?:youtube|youtu)(?:(?:\.com|\.be)\/)(?:embed\/)?(?:watch\?)?(?:feature=player_embedded)?&?(?:v=)?([0-z]{11}|[0-z]{4}(?:\-|\_)[0-z]{4}|.(?:\-|\_)[0-z]{9}))')
In [39]: r.match('http://youtube.com/watch?v=iwGFalTRHDA').groups()
Out[39]: ('iwGFalTRHDA',)
In [40]: r.match('http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related').groups()
Out[40]: ('iwGFalTRHDA',)
In [41]: r.match('https://youtube.com/iwGFalTRHDA').groups()
Out[41]: ('iwGFalTRHDA',)

In order to not catch specific group in regex you should this: (?:...)

Original Thread

By anonymous    2017-09-20

Here's the approach I'd use, no regex needed at all.

(This is pretty much equivalent to @Willem Van Onsem's solution, plus an easy to run / update unit test).

from urlparse import parse_qs
from urlparse import urlparse
import re
import unittest


TEST_URLS = [
    ('iwGFalTRHDA', 'http://youtube.com/watch?v=iwGFalTRHDA'),
    ('iwGFalTRHDA', 'http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related'),
    ('iwGFalTRHDA', 'https://youtube.com/iwGFalTRHDA'),
    ('n17B_uFF4cA', 'http://youtu.be/n17B_uFF4cA'),
    ('iwGFalTRHDA', 'youtube.com/iwGFalTRHDA'),
    ('n17B_uFF4cA', 'youtube.com/n17B_uFF4cA'),
    ('r5nB9u4jjy4', 'http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4'),
    ('t-ZRX8984sc', 'http://www.youtube.com/watch?v=t-ZRX8984sc'),
    ('t-ZRX8984sc', 'http://youtu.be/t-ZRX8984sc'),
    (None, 'http://www.stackoverflow.com')
]

YOUTUBE_DOMAINS = [
    'youtu.be',
    'youtube.com',
]


def extract_id(url_string):
    # Make sure all URLs start with a valid scheme
    if not url_string.lower().startswith('http'):
        url_string = 'http://%s' % url_string

    url = urlparse(url_string)

    # Check host against whitelist of domains
    if url.hostname.replace('www.', '') not in YOUTUBE_DOMAINS:
        return None

    # Video ID is usually to be found in 'v' query string
    qs = parse_qs(url.query)
    if 'v' in qs:
        return qs['v'][0]

    # Otherwise fall back to path component
    return url.path.lstrip('/')


class TestExtractID(unittest.TestCase):

    def test_extract_id(self):
        for expected_id, url in TEST_URLS:
            result = extract_id(url)
            self.assertEqual(
                expected_id, result, 'Failed to extract ID from '
                'URL %r (got %r, expected %r)' % (url, result, expected_id))


if __name__ == '__main__':
    unittest.main()

Original Thread

By anonymous    2018-01-22

Here is an idea with a vbscript using a Regex to extract the "Videocode"

Data = "https://www.youtube.com/watch?v=Videocode" & vbCrlf &_
"http://www.youtube.com/watch?v=iwGFalTRHDA" & vbCrlf &_
"http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related" & vbCrlf &_
"http://youtu.be/iwGFalTRHDA" & vbCrlf &_
"http://youtu.be/n17B_uFF4cA" & vbCrlf &_
"http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4" & vbCrlf &_
"http://www.youtube.com/watch?v=t-ZRX8984sc" & vbCrlf &_
"http://youtu.be/t-ZRX8984sc"
Data_Extracted = Extract(Data,"http(?:s?):\/\/(?:www\.)?youtu(?:be\.com\/watch\?v=|\.be\/)([\w\-\_]*)(&(amp;)?‌​[\w\?‌​=]*)?")
WScript.echo Data_Extracted
'************************************************
Function Extract(Data,Pattern)
   Dim oRE,oMatches,Match,Line
   set oRE = New RegExp
   oRE.IgnoreCase = True
   oRE.Global = True
   oRE.Pattern = Pattern
   set oMatches = oRE.Execute(Data)
   If not isEmpty(oMatches) then
       For Each Match in oMatches  
           Line = Line & Match.SubMatches(0) & vbcrlf
       Next
       Extract = Line
   End if
End Function
'************************************************

EDIT : Using an hybrid code batch with a vbscript

@echo off
Title Extract Videocode from Youtube links
Set "Tmpvbs=%temp%\Tmpvbs.vbs"
Set "InputFile=URL.txt"
Set "OutPutFile=OutPutCode.txt"
Call :Extract "%InputFile%" "%OutPutFile%"
Start "" "%OutPutFile%" & exit
::****************************************************
:Extract <InputData> <OutPutData>
(
echo Data = WScript.StdIn.ReadAll
echo Data = Extract(Data,"http(?:s?):\/\/(?:www\.)?youtu(?:be\.com\/watch\?v=|\.be\/^)([\w\-\_]*)(&(amp;)?‌​[\w\?‌​=]*)?"^)
echo WScript.StdOut.WriteLine Data
echo '************************************************
echo Function Extract(Data,Pattern^)
echo    Dim oRE,oMatches,Match,Line
echo    set oRE = New RegExp
echo    oRE.IgnoreCase = True
echo    oRE.Global = True
echo    oRE.Pattern = Pattern
echo    set oMatches = oRE.Execute(Data^)
echo    If not isEmpty(oMatches^) then
echo        For Each Match in oMatches  
echo            Line = Line ^& Match.SubMatches(0^) ^& vbcrlf
echo        Next
echo        Extract = Line
echo    End if
echo End Function
echo '************************************************
)>"%Tmpvbs%"
cscript /nologo "%Tmpvbs%" < "%~1" > "%~2"
If Exist "%Tmpvbs%" Del "%Tmpvbs%"
exit /b
::**********************************************************************************

Original Thread

Popular Videos 641

Submit Your Video

If you have some great dev videos to share, please fill out this form.