Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Tusk: May&#x27;s Brexit plan won&#x27;t work</a></h3></p>
<p>The EU Council president dismisses the Chequers plan - the PM says there is &quot;hard work&quot; to be done.</p>
<p>Time: 2018-09-20T14:49:35.000Z</p>
<p>Category: UK Politics</p>
<p><h3><a href="http://www.bbc.com">Passengers hurt after India pilot &#x27;blunder&#x27;</a></h3></p>
<p>Dozens on a Jet Airways flight are injured as pilots &quot;forget&quot; a switch to maintain cabin pressure.</p>
<p>Time: 2018-09-20T12:24:24.000Z</p>
<p>Category: India</p>
<p><h3><a href="http://www.bbc.com">Children in cart killed in Dutch rail crash</a></h3></p>
<p>Four children from a day-care centre have died in a collision involving a train and an electric cart.</p>
<p>Time: 2018-09-20T15:41:27.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Le Pen ordered to take psychiatric tests</a></h3></p>
<p>The court order is part of a probe into the far-right leader sharing images of IS atrocities.</p>
<p>Time: 2018-09-20T13:24:10.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">&#x27;We haven&#x27;t had sex in three years&#x27;</a></h3></p>
<p>A survey of users of two UK parenting websites finds 29% have had sex fewer than 11 times in the past year.</p>
<p>Time: 2018-09-20T00:44:53.000Z</p>
<p>Category: Health</p>
<p><h3><a href="http://www.bbc.com">Girl, 9, shot dead as troops clear traffic</a></h3></p>
<p>The nine-year-old Somali girl died after soldiers fired shots to clear a traffic jam in Mogadishu.</p>
<p>Time: 2018-09-20T13:04:40.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">Russia reinstated after doping ban</a></h3></p>
<p>The suspension of Russia&#x27;s anti-doping agency is lifted despite widespread opposition.</p>
<p>Time: 2018-09-20T14:43:57.000Z</p>
<p>Category: BBC Sport</p>
<p><h3><a href="http://www.bbc.com">Bobi Wine back in Uganda &#x27;to fight&#x27;</a></h3></p>
<p>The Ugandan pop star-turned-MP is back at his home after a medical trip to the US.</p>
<p>Time: 2018-09-20T15:23:34.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">The extraordinary story of how I found my parents</a></h3></p>
<p>As a young man Tuy left America and headed back to Vietnam in search of his mother and his identity</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Kim &#x27;wants fast denuclearisation&#x27;</a></h3></p>
<p>North Korea&#x27;s leader also wants a second summit with Donald Trump soon, the South&#x27;s president says.</p>
<p>Time: 2018-09-20T12:56:27.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Trump urged Spain to &#x27;build Sahara wall&#x27;</a></h3></p>
<p>The Spanish foreign minister says he disagreed with the idea, put to him during a visit to the US.</p>
<p>Time: 2018-09-20T12:53:33.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Kim &#x27;wants fast denuclearisation&#x27;</a></h3></p>
<p>North Korea&#x27;s leader also wants a second summit with Donald Trump soon, the South&#x27;s president says.</p>
<p>Time: 2018-09-20T12:56:27.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Trump urged Spain to &#x27;build Sahara wall&#x27;</a></h3></p>
<p>The Spanish foreign minister says he disagreed with the idea, put to him during a visit to the US.</p>
<p>Time: 2018-09-20T12:53:33.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Sky bid tussle to be settled by auction</a></h3></p>
<p>The broadcaster has been subject to rival bids from Rupert Murdoch&#x27;s Fox and US giant Comcast.</p>
<p>Time: 2018-09-20T13:19:59.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Two shark attacks at Australia tourist spot</a></h3></p>
<p>A girl and a woman have been seriously injured in separate incidents within 24 hours.</p>
<p>Time: 2018-09-20T07:38:38.000Z</p>
<p>Category: Australia</p>
<p><h3><a href="http://www.bbc.com">Video of driver slapping boy divides France</a></h3></p>
<p>Thousands back a Paris bus driver after he was caught on camera slapping a boy, 12, for &quot;disrespect&quot;.</p>
<p>Time: 2018-09-20T11:47:43.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">In China, a ‘Great Wall’ no-one knows</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Building a city under the sea</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Cary Fukunaga to direct next James Bond</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">My holiday with the Afghan mujahideen</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Rugby stars told to hide tattoos in Japan</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">N Koreans cheer S Korean president at games</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The surprising ways prisoners have won their freedom</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The new face of the far right in Europe</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Trump&#x27;s tariffs putting the squeeze on the Midwest</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC Future: Is sugar really bad for you?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The link between earth tremors, God and Nigeria&#x27;s elections</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Hologram phone calls - sci-fi or serious possibility?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The man behind Italy&#x27;s migrant drop</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Surviving the evil at Lagarie</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The children with huge burdens</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Last call for Nevada’s brothels?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The lost decade</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The Battle of Britain&#x27;s enigmatic Czech hero</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why you don’t really have a ‘type’</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">Decoding Japan for the masses</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">The other Great Wall no-one knows</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">The forgotten women of the Bauhaus</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">The bird that came back from the dead</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">Is sugar really bad for you?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">The art of worthless money</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">County Championship - Warwickshire closing on promotion</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Cricket</p>
<p><h3><a href="http://www.bbc.com">From &#x27;new Rooney&#x27; to cocaine &amp; a suicide attempt</a></h3></p>
<p></p>
<p>Time: 2018-09-20T10:19:43.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">11 operations &amp; 668 days out – Villarreal&#x27;s Cazorla on Arsenal, gangrene &amp; Gerrard</a></h3></p>
<p></p>
<p>Time: 2018-09-19T16:48:48.000Z</p>
<p>Category: European Football</p>
<p><h3><a href="http://www.bbc.com">Watch: World Equestrian Games - Show Jumping</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Equestrian</p>
<p><h3><a href="http://www.bbc.com">What now for sport after Russia ban lifted?</a></h3></p>
<p></p>
<p>Time: 2018-09-20T15:13:56.000Z</p>
<p>Category: Sport</p>
<p><h3><a href="http://www.bbc.com">Konta beaten in straight sets by Vekic</a></h3></p>
<p></p>
<p>Time: 2018-09-20T09:59:35.000Z</p>
<p>Category: Tennis</p>
<p><h3><a href="http://www.bbc.com">Cipriani out of England training camp</a></h3></p>
<p></p>
<p>Time: 2018-09-20T09:21:19.000Z</p>
<p>Category: Rugby Union</p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main