Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Gun advocates &#x27;exploiting&#x27; Florida tragedy</a></h3></p>
<p>The head of America&#x27;s powerful gun lobby says politicians are exploiting the Florida school shooting.</p>
<p>Time: 2018-02-22T17:51:52.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Unicef deputy resigns after allegations</a></h3></p>
<p>Ex-Save the Children head Justin Forsyth quits Unicef after allegations of inappropriate behaviour.</p>
<p>Time: 2018-02-22T17:57:35.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">New deadly raids rock Syria rebel enclave</a></h3></p>
<p>Warplanes continue to pound the Eastern Ghouta as the UN Security Council meets.</p>
<p>Time: 2018-02-22T17:23:40.000Z</p>
<p>Category: Middle East</p>
<p><h3><a href="http://www.bbc.com">Wobbling over the world&#x27;s longest glass bridge</a></h3></p>
<p>Chinese tourists celebrate the new lunar year by walking across the world&#x27;s longest glass bridge.</p>
<p>Time: 2018-02-22T13:08:04.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Seychelles starts &#x27;Britain-sized&#x27; reserve</a></h3></p>
<p>A novel deal with donors including Leonardo DiCaprio turned public debt into conservation funding.</p>
<p>Time: 2018-02-22T14:49:57.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">Anger over missing Nigerian schoolgirls</a></h3></p>
<p>Parents are still waiting for news of what happened to their daughters after a Boko Haram attack.</p>
<p>Time: 2018-02-22T17:29:22.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">N Korea to send general to Olympics</a></h3></p>
<p>The high-ranking official is thought to have masterminded several attacks on South Korea.</p>
<p>Time: 2018-02-22T10:46:17.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Rising number of EU nationals leaving UK</a></h3></p>
<p>About 130,000 EU citizens emigrated from the UK in the year to September, but a higher number arrived.</p>
<p>Time: 2018-02-22T17:30:09.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">Top Greek politicians to be investigated</a></h3></p>
<p>Ten top Greek politicians will be investigated over alleged bribes by medicines giant Novartis.</p>
<p>Time: 2018-02-22T14:15:11.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Kim murder suspects &#x27;paid for pranks&#x27;</a></h3></p>
<p>A court hears two women charged with killing Kim Jong-nam thought they were taking part in a TV prank.</p>
<p>Time: 2018-02-22T17:38:33.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Worker loses porn-on-computer privacy case</a></h3></p>
<p>The French rail employee had claimed his right to privacy was breached by his employer.</p>
<p>Time: 2018-02-22T16:14:51.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Kim murder suspects &#x27;paid for pranks&#x27;</a></h3></p>
<p>A court hears two women charged with killing Kim Jong-nam thought they were taking part in a TV prank.</p>
<p>Time: 2018-02-22T17:38:33.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Worker loses porn-on-computer privacy case</a></h3></p>
<p>The French rail employee had claimed his right to privacy was breached by his employer.</p>
<p>Time: 2018-02-22T16:14:51.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">&#x27;I was lucky to come home from school&#x27;</a></h3></p>
<p>President Trump met pupils and parents affected by school shootings in Florida and across the US.</p>
<p>Time: 2018-02-22T11:40:22.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Russian arrested over Euro 2016 attack</a></h3></p>
<p>He is wanted for a serious attack on an England fan during Euro 2016 and may face 15 years in jail.</p>
<p>Time: 2018-02-22T15:09:52.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">US couple accused of locking up children</a></h3></p>
<p>Police were alerted after one of the adopted children left the house to make a phone call.</p>
<p>Time: 2018-02-22T14:49:13.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Olympic failures don&#x27;t define me - Christie</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">USA win ice hockey gold on penalties</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Gisin wins gold as Vonn fails to finish</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Russian curler loses medal over doping</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">South Korea&#x27;s accidental curling superstars</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Arm me with anything but a gun&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Trudeau&#x27;s &#x27;Bollywood&#x27; wardrobe amuses India</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Farming the old-fashioned way</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Who owns &#x27;lucky money&#x27; in red envelopes?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Moment policeman catches falling child</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Calais: The camp that never really closes</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Weird world I was warned never to tell about&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Syria conflict: Will powers end up in direct war?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">EU shadow-boxing towards Brexit</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Counting every school shooting so it never seems normal&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Falling hair and eating inedible plants to survive</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Did the Empire resist women’s suffrage in India?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">What will Brexit mean for Britain&#x27;s overseas territories?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The CIA secret on the ocean floor</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Shannon Matthews: The unravelling of the truth</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">I was abused by Barry Bennell</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The playboy who got away with $242m</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">A school shooting comes to &#x27;paradise&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How personality gets under your skin</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">The science ace with 16 million fans</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Resplendent relics of a turbulent past</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">Real-life tragedies that inspire films</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">The chameleon the size of an ant</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">The downsides of perfectionism</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">A peek behind plane doors</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Europa League - Zenit v Celtic</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">USA end 20-year wait, stunning stone &amp; halfpipe heaven</a></h3></p>
<p></p>
<p>Time: 2018-02-22T16:10:44.000Z</p>
<p>Category: Winter Olympics</p>
<p><h3><a href="http://www.bbc.com">Watch Winter Olympics replays: Women&#x27;s big air final and Nordic combined</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Winter Olympics</p>
<p><h3><a href="http://www.bbc.com">Short ready to sell Sunderland for free</a></h3></p>
<p></p>
<p>Time: 2018-02-22T17:51:34.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">GB&#x27;s Ryding inspired by Mhyrer gold at 35</a></h3></p>
<p></p>
<p>Time: 2018-02-22T13:48:10.000Z</p>
<p>Category: Winter Olympics</p>
<p><h3><a href="http://www.bbc.com">Keeper crisis after goalie hurt by cow</a></h3></p>
<p></p>
<p>Time: 2018-02-22T10:13:00.000Z</p>
<p>Category: South Scotland</p>
<p><h3><a href="http://www.bbc.com">Hughes returns for Calcutta Cup clash</a></h3></p>
<p></p>
<p>Time: 2018-02-22T10:21:38.000Z</p>
<p>Category: Rugby Union</p>
<p><h3><a href="http://www.bbc.com">Stormzy takes swipe at the PM over Grenfell</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Este Haim calls Cheryl over videobombing</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why I support my striking uni lecturers</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Jack Whitehall&#x27;s best one-liners at the Brits</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">CS:GO Major to be held in London</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main