Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Trump hits out at ex-lawyer over tape</a></h3></p>
<p>It comes amid reports that he was secretly recorded discussing payments to an ex-Playboy model.</p>
<p>Time: 2018-07-21T13:20:35.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Japan swelters in deadly heatwave</a></h3></p>
<p>Scorching temperatures have killed 30 people in two weeks, prompting officials to issue warnings.</p>
<p>Time: 2018-07-21T09:18:24.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">French police quizzed about assault video</a></h3></p>
<p>Officers are suspected of leaking footage in a bid to help a presidential aide accused of assault.</p>
<p>Time: 2018-07-21T11:50:05.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Nine US tour boat victims from same family</a></h3></p>
<p>Seventeen people were killed when an amphibious &quot;duck boat&quot; carrying tourists sank in stormy weather.</p>
<p>Time: 2018-07-21T06:54:46.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Elon Musk&#x27;s farting unicorn fight settled</a></h3></p>
<p>The billionaire chief executive of Tesla got in a row with a Colorado potter over use of the image.</p>
<p>Time: 2018-07-21T13:59:27.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Facebook investigates another data firm</a></h3></p>
<p>Crimson Hexagon is alleged to be giving data to government bodies, possibly breaching Facebook rules.</p>
<p>Time: 2018-07-21T00:30:00.000Z</p>
<p>Category: Technology</p>
<p><h3><a href="http://www.bbc.com">Ohio governor spares death row inmate</a></h3></p>
<p>New evidence detailed how Raymond Tibbetts had been burned, thrown down stairs and beaten as a child.</p>
<p>Time: 2018-07-21T06:33:56.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Galle cricket stadium may be demolished</a></h3></p>
<p>The famously picturesque ground, in southern Sri Lanka, is next to a World Heritage-listed fort.</p>
<p>Time: 2018-07-21T12:07:27.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Spain&#x27;s opposition party elects new leader</a></h3></p>
<p>Pablo Casado replaces ex-PM Mariano Rajoy who was ousted in a no-confidence vote last month.</p>
<p>Time: 2018-07-21T13:28:13.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">The celebrities &#x27;banned&#x27; from Thatcher’s party</a></h3></p>
<p>What an annotated guest list for a party at Number 10 reveals about Denis Thatcher.</p>
<p>Time: 2018-07-20T23:36:37.000Z</p>
<p>Category: UK Politics</p>
<p><h3><a href="http://www.bbc.com">Buffy the Vampire Slayer to get TV reboot</a></h3></p>
<p>A black actress is reportedly due to take on the lead role in a remake of the 90s cult show.</p>
<p>Time: 2018-07-21T12:17:32.000Z</p>
<p>Category: Newsbeat</p>
<p><h3><a href="http://www.bbc.com">The celebrities &#x27;banned&#x27; from Thatcher’s party</a></h3></p>
<p>What an annotated guest list for a party at Number 10 reveals about Denis Thatcher.</p>
<p>Time: 2018-07-20T23:36:37.000Z</p>
<p>Category: UK Politics</p>
<p><h3><a href="http://www.bbc.com">Buffy the Vampire Slayer to get TV reboot</a></h3></p>
<p>A black actress is reportedly due to take on the lead role in a remake of the 90s cult show.</p>
<p>Time: 2018-07-21T12:17:32.000Z</p>
<p>Category: Newsbeat</p>
<p><h3><a href="http://www.bbc.com">Toblerone to revert to original shape</a></h3></p>
<p>Makers scrap the 150g bar, likened to a bike rack, in favour of a heavier one in its traditional shape.</p>
<p>Time: 2018-07-21T10:39:27.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">Tom Jones vows to get back on stage</a></h3></p>
<p>The Welsh singer called off four shows this week as he is in hospital with a &quot;bacterial infection&quot;.</p>
<p>Time: 2018-07-21T08:58:53.000Z</p>
<p>Category: Wales</p>
<p><h3><a href="http://www.bbc.com">Mass gathering of golden retrievers</a></h3></p>
<p>The dogs and their owners gathered at the Highlands estate where the breed was first founded 150 years ago.</p>
<p>Time: 2018-07-20T09:27:31.000Z</p>
<p>Category: Highlands &amp; Islands</p>
<p><h3><a href="http://www.bbc.com">Alexa, are you friends with our kids?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The boy who wrote his own obituary</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">What was life like before Google?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Mumbai slum gets colourful makeover</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Rice paddy cartoons celebrate Japanese artist</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The 600 dogs left behind in Macau</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Giant moon artwork goes missing in post</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">What it&#x27;s like to be a wing-walker</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;I face mockery, rejection and harassment&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Sex &#x27;superbug&#x27; MGen: A 90-second guide</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Thai artists&#x27; huge mural of cave rescue heroes</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;The screeching drill, the burnt flesh - my dental nightmare&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Helsinki aftershocks jolt US security elite</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Who&#x27;s had a week to forget, and a week to remember?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How to cycle like a Tour de France star</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How Pussy Riot burst into World Cup final</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Reality Check: What would &#x27;no deal&#x27; look like?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Last survivor: The story of the &#x27;world&#x27;s loneliest man&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why you really are ‘as old as you feel’</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">Inside a Chinese sex doll factory</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">A trip most people wouldn’t dare to do</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">The 10 smartest beach reads of 2018</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">Why we cannot go faster than light</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">One of the world&#x27;s largest vehicles</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">How open offices kill teamwork</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">The Open: Woods birdie blitz; Spieth eagle - clips, radio &amp; text</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Golf</p>
<p><h3><a href="http://www.bbc.com">Watch: London Anniversary Games - follow live text &amp; analysis from London Stadium</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Athletics</p>
<p><h3><a href="http://www.bbc.com">Listen: German GP - Vettel on pole after Hamilton breaks down</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Formula 1</p>
<p><h3><a href="http://www.bbc.com">Tour de France - will &#x27;punchy&#x27; Thomas stay in yellow?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Cycling</p>
<p><h3><a href="http://www.bbc.com">Rose shoots career-best 64 at The Open</a></h3></p>
<p></p>
<p>Time: 2018-07-21T14:14:48.000Z</p>
<p>Category: Golf</p>
<p><h3><a href="http://www.bbc.com">&#x27;Why on earth has she stopped?&#x27; Race leader &#x27;finishes&#x27; half a lap early</a></h3></p>
<p></p>
<p>Time: 2018-07-21T13:27:49.000Z</p>
<p>Category: Athletics</p>
<p><h3><a href="http://www.bbc.com">Vettel on pole after Hamilton breaks down</a></h3></p>
<p></p>
<p>Time: 2018-07-21T13:37:25.000Z</p>
<p>Category: Formula 1</p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main