Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Chemical experts finally see Syria site</a></h3></p>
<p>International experts make a much-delayed examination of the site of a suspected attack in Douma.</p>
<p>Time: 2018-04-21T16:54:10.000Z</p>
<p>Category: Middle East</p>
<p><h3><a href="http://www.bbc.com">Death penalty for India child rapists</a></h3></p>
<p>The change to the penal code comes amid nationwide outrage over high-profile cases of child rape.</p>
<p>Time: 2018-04-21T14:06:47.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Smallville star accused of sex trafficking</a></h3></p>
<p>She pleaded not guilty to recruiting women for sexual exploitation in a purported self-help group.</p>
<p>Time: 2018-04-21T06:35:57.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Hamas man shot dead in Malaysia</a></h3></p>
<p>Malaysian officials say the suspects are believed to have links to a foreign intelligence service.</p>
<p>Time: 2018-04-21T16:11:49.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Tributes paid at Barbara Bush&#x27;s funeral</a></h3></p>
<p>President Trump did not join four ex-presidents at the former first lady&#x27;s service in Texas.</p>
<p>Time: 2018-04-21T17:18:54.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Former porn star: &#x27;I got death threats from IS&#x27;</a></h3></p>
<p>Former porn star Mia Khalifa says she received abuse after she filmed a porn scene wearing a hijab.</p>
<p>Time: 2018-04-21T11:23:37.000Z</p>
<p>Category: Norfolk</p>
<p><h3><a href="http://www.bbc.com">World welcomes N Korea nuclear test halt</a></h3></p>
<p>North Korean leader Kim Jong-un&#x27;s announcement is welcomed by world powers, ahead of key summits.</p>
<p>Time: 2018-04-21T11:46:32.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Avicii&#x27;s music &#x27;will live forever&#x27;</a></h3></p>
<p>It&#x27;s not yet known how the Swedish DJ died but he had been suffering severe pancreatitis.</p>
<p>Time: 2018-04-21T09:48:41.000Z</p>
<p>Category: Newsbeat</p>
<p><h3><a href="http://www.bbc.com">Rare brown bear dies in capture operation</a></h3></p>
<p>The endangered animal died in Italy during an attempt to fit him with a collar to track movements.</p>
<p>Time: 2018-04-21T08:13:05.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Ten killed as Nicaragua crisis deepens</a></h3></p>
<p>Protesters, branded as &#x27;vampires&#x27; by the government, are opposed to changes in the pension system.</p>
<p>Time: 2018-04-21T08:06:05.000Z</p>
<p>Category: Latin America &amp; Caribbean</p>
<p><h3><a href="http://www.bbc.com">Remote farmhouse meeting for UN talks</a></h3></p>
<p>Members of the Security Council seek to heal divisions over Syria during a meeting in Sweden.</p>
<p>Time: 2018-04-21T10:47:21.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Ten killed as Nicaragua crisis deepens</a></h3></p>
<p>Protesters, branded as &#x27;vampires&#x27; by the government, are opposed to changes in the pension system.</p>
<p>Time: 2018-04-21T08:06:05.000Z</p>
<p>Category: Latin America &amp; Caribbean</p>
<p><h3><a href="http://www.bbc.com">Remote farmhouse meeting for UN talks</a></h3></p>
<p>Members of the Security Council seek to heal divisions over Syria during a meeting in Sweden.</p>
<p>Time: 2018-04-21T10:47:21.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Engine inspections ordered after accident</a></h3></p>
<p>Hundreds of jet engines will be checked worldwide after a deadly Southwest Airlines accident.</p>
<p>Time: 2018-04-21T01:13:08.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">FA Cup semi-final: Man Utd v Spurs</a></h3></p>
<p>Manchester United take on Tottenham in the FA Cup semi-final at Wembley - watch BBC One, listen to Radio 5 live and local radio plus follow text commentary.</p>
<p>Time: </p>
<p>Category: BBC Sport</p>
<p><h3><a href="http://www.bbc.com">&#x27;Accidental tourist&#x27; visits mistaken island</a></h3></p>
<p>American Joe Hill thought a Facebook group for Jersey, the Channel Island, was for New Jersey.</p>
<p>Time: 2018-04-20T23:05:43.000Z</p>
<p>Category: Jersey</p>
<p><h3><a href="http://www.bbc.com">The space movies Nasa loves... and hates</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Syria&#x27;s warren of war tunnels revealed</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Castro&#x27;s people - a photographic road trip through Cuba</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Remember my life, not the money I made&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Is a new hate speech law killing German comedy?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The make-up artist helping scarred women</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why has Kim Jong-un halted North Korean tests now?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The superstar DJ whose death has shocked dance music</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Is marathon running bad for you?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">What another whirlwind week means for Trump</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The free app giving away thousands of pounds</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why singer Lucy Dacus makes her fans cry</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">What do Indian parents tell their children about rape?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The holiday village run by spies</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Cuba after the Castros</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Obituary: Barbara Bush - former US First Lady and literacy campaigner</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">‘I was a teacher for 17 years - but I couldn’t read’</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The man who thinks Europe is being invaded</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Five myths about first aid</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">How Sweden changed overnight</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">The letter Anne Frank wrote to the US</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">Saddam’s ‘Disney for a despot’</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">The worst storm in British history?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">A British rocket base returns to life</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">The death of the conference call?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Sanchez levels for Man Utd</a></h3></p>
<p></p>
<p>Time: 2018-04-21T17:04:17.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">Sunderland 1-2 Burton Albion</a></h3></p>
<p></p>
<p>Time: 2018-04-21T16:54:09.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">GB&#x27;s Konta levels Fed Cup tie in Japan</a></h3></p>
<p></p>
<p>Time: 2018-04-21T08:46:15.000Z</p>
<p>Category: Tennis</p>
<p><h3><a href="http://www.bbc.com">Rate the players: Man Utd v Spurs</a></h3></p>
<p></p>
<p>Time: 2018-04-21T14:41:13.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">County Championship - Essex set Lancashire 320</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Cricket</p>
<p><h3><a href="http://www.bbc.com">West Bromwich Albion 2-2 Liverpool</a></h3></p>
<p></p>
<p>Time: 2018-04-21T17:08:38.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">Leinster 38-16 Scarlets</a></h3></p>
<p></p>
<p>Time: 2018-04-21T16:49:58.000Z</p>
<p>Category: Rugby Union</p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main