Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">IS teenager to lose UK citizenship</a></h3></p>
<p>Shamima Begum, who joined the Islamic State group in Syria in 2015, had said she wanted to return home.</p>
<p>Time: 2019-02-19T22:36:46.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">IS teenager to lose UK citizenship</a></h3></p>
<p>Shamima Begum, who joined the Islamic State group in Syria in 2015, had said she wanted to return home.</p>
<p>Time: 2019-02-19T22:36:46.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">Karl Lagerfeld: Five things you should know</a></h3></p>
<p>Five things you might not know about the iconic fashion designer, who has died after a long career.</p>
<p>Time: 2019-02-19T20:15:07.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">Avalanche buries skiers in Switzerland</a></h3></p>
<p>At least four people have been hurt and more may still be missing in Crans-Montana, police say.</p>
<p>Time: 2019-02-19T21:25:01.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">New probe of White House Saudi nuclear plan</a></h3></p>
<p>A government report says the US is rushing to transfer sensitive nuclear technology to Saudi Arabia.</p>
<p>Time: 2019-02-19T22:18:00.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">WW2 kiss statue vandalised with &#x27;#MeToo&#x27;</a></h3></p>
<p>A statue depicting the famous kiss between a US sailor and a nurse at the end of World War Two has been defaced with &#x27;#MeToo&#x27;.</p>
<p>Time: 2019-02-19T21:28:54.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">France rallies after anti-Semitic attacks</a></h3></p>
<p>Around 70 demonstrations took place across France against a spate of anti-Semitic attacks.</p>
<p>Time: 2019-02-19T22:09:32.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Brazil joins Venezuela aid campaign</a></h3></p>
<p>The growing foreign aid operation organised by Venezuela&#x27;s opposition is in defiance of President Maduro.</p>
<p>Time: 2019-02-20T00:02:42.000Z</p>
<p>Category: Latin America &amp; Caribbean</p>
<p><h3><a href="http://www.bbc.com">Bernie Sanders runs for president again</a></h3></p>
<p>The 77-year-old senator and self-declared socialist announces a second bid for the White House.</p>
<p>Time: 2019-02-19T20:36:58.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Seven children die in Canada house fire</a></h3></p>
<p>The seven children and their parents had arrived in Canada in 2017 as Syrian refugees.</p>
<p>Time: 2019-02-19T22:28:51.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">US editor calls for KKK to lynch Democrats</a></h3></p>
<p>The editor said the Ku Klux Klan should lynch &quot;socialist-communist&quot; Democrats and raid Washington DC.</p>
<p>Time: 2019-02-19T19:39:19.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Seven children die in Canada house fire</a></h3></p>
<p>The seven children and their parents had arrived in Canada in 2017 as Syrian refugees.</p>
<p>Time: 2019-02-19T22:28:51.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">US editor calls for KKK to lynch Democrats</a></h3></p>
<p>The editor said the Ku Klux Klan should lynch &quot;socialist-communist&quot; Democrats and raid Washington DC.</p>
<p>Time: 2019-02-19T19:39:19.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">EU blasts Hungary &#x27;fake news&#x27; on migrants</a></h3></p>
<p>The EU Commission condemns Hungary&#x27;s right-wing government over a new poster campaign.</p>
<p>Time: 2019-02-19T17:35:14.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Eighth UK MP quits Labour</a></h3></p>
<p>Joan Ryan says she cannot remain in a party which tolerates a &quot;culture of anti-Jewish racism&quot;.</p>
<p>Time: 2019-02-19T23:33:01.000Z</p>
<p>Category: UK Politics</p>
<p><h3><a href="http://www.bbc.com">New York City bans hair discrimination</a></h3></p>
<p>The guidance gives black people the right to wear hairstyles previously deemed &quot;unprofessional&quot;.</p>
<p>Time: 2019-02-19T18:55:22.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">&#x27;Bees follow me and I don&#x27;t know why&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;It was like standing in concrete&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Karl Lagerfeld: The life of a design icon in pictures</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The blind climbers of Kilimanjaro</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Will we worship artificial intelligence in the future?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How US opioid crisis is biting Australia</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How many IS foreign fighters are left?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">17 quirky facts about this year&#x27;s Oscars</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Karl Lagerfeld the emperor of fashion</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Could the dream of stress-free travel be coming true?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">After Kashmir attack, what are India&#x27;s options?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How &#x27;cheating husbands&#x27; are linked to Sudan protests</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">When class sizes fall so does teachers&#x27; pay</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">An extremist in the family</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Chernobyl: The end of a three-decade experiment</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The soldier with a secret talent - ballet</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The downfall of one of the world&#x27;s most notorious criminals</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Three things that could stop Elizabeth Warren</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why do humans drink animal milk?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">The easy way to learn a new language</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Singapore’s (unlikely) secret weapon</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">A subversive message hidden by Da Vinci</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">The puzzle of ancient brain surgery</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">Is civilisation about to collapse?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">The trendy perks employees don’t want</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Bayern Munich frustrate Liverpool in first leg</a></h3></p>
<p></p>
<p>Time: 2019-02-19T22:56:11.000Z</p>
<p>Category: European Football</p>
<p><h3><a href="http://www.bbc.com">&#x27;You are hated there&#x27; - what&#x27;s it like when England play Wales in Cardiff?</a></h3></p>
<p></p>
<p>Time: 2019-02-19T20:13:00.000Z</p>
<p>Category: Rugby Union</p>
<p><h3><a href="http://www.bbc.com">Wilder or Fury fight can happen - Joshua</a></h3></p>
<p></p>
<p>Time: 2019-02-19T22:20:23.000Z</p>
<p>Category: Boxing</p>
<p><h3><a href="http://www.bbc.com">Barcelona have 25 shots but goalless in draw with Lyon</a></h3></p>
<p></p>
<p>Time: 2019-02-19T22:33:34.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">Life at the limits of exhaustion - why cyclists suffer</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;We&#x27;d like to make history by winning in India&#x27;</a></h3></p>
<p></p>
<p>Time: 2019-02-20T00:04:20.000Z</p>
<p>Category: Women&#x27;s Cricket</p>
<p><h3><a href="http://www.bbc.com">Leclerc fastest for Ferrari in F1 test</a></h3></p>
<p></p>
<p>Time: 2019-02-19T17:11:19.000Z</p>
<p>Category: Formula 1</p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main