Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Wrong film named best picture at Oscars</a></h3></p>
<p>Moonlight wins best picture at the Oscars - after Faye Dunaway initially says La La Land won.</p>
<p>Time: 2017-02-27T05:50:17.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">Oscar confusion as wrong winner announced</a></h3></p>
<p>The moment when La La Land producer realised Moonlight had won the Oscar for best picture</p>
<p>Time: 2017-02-27T06:35:20.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">US commando&#x27;s father refused to meet Trump</a></h3></p>
<p>The US must investigate the &quot;stupid mission&quot; in Yemen that saw his son killed, Bill Owens says.</p>
<p>Time: 2017-02-27T03:51:34.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">UK deports woman despite 27-year marriage</a></h3></p>
<p>Irene Clennel has been sent back to Singapore, leaving behind her husband and sons in Britain.</p>
<p>Time: 2017-02-27T01:19:05.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">Millions without water in Chile</a></h3></p>
<p>Rain and floods contaminate a major river and leave four million without water in Santiago.</p>
<p>Time: 2017-02-26T23:54:41.000Z</p>
<p>Category: Latin America &amp; Caribbean</p>
<p><h3><a href="http://www.bbc.com">Australian taxi protest stops traffic</a></h3></p>
<p>Angry taxi drivers bring parts of Melbourne to a halt in a protest over planned industry reform.</p>
<p>Time: 2017-02-27T03:33:10.000Z</p>
<p>Category: Australia</p>
<p><h3><a href="http://www.bbc.com">LSE/Deutsche Boerse merger under threat</a></h3></p>
<p>London Stock Exchange warns it cannot obtain European Commission approval for the 29bn euro deal.</p>
<p>Time: 2017-02-27T00:36:57.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Aliens actor Bill Paxton dies aged 61</a></h3></p>
<p>Actor Bill Paxton, known for roles in Aliens and Titanic, has died aged 61, his family tell US media</p>
<p>Time: 2017-02-26T20:39:28.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">Veteran UK Labour MP Gerald Kaufman dies</a></h3></p>
<p>Sir Gerald had been an MP since 1970 and became the oldest serving member of the Commons in 2015.</p>
<p>Time: 2017-02-27T00:48:02.000Z</p>
<p>Category: UK Politics</p>
<p><h3><a href="http://www.bbc.com">Nokia 3310 mobile phone resurrected</a></h3></p>
<p>An iconic Nokia phone is revamped with added battery life and new features alongside several Nokia-branded Android models.</p>
<p>Time: 2017-02-26T16:30:03.000Z</p>
<p>Category: Technology</p>
<p><h3><a href="http://www.bbc.com">US condemns threat to Ukraine monitors</a></h3></p>
<p>Pro-Russian rebels in eastern Ukraine seized a drone at gunpoint from OSCE monitors on Friday.</p>
<p>Time: 2017-02-26T18:23:19.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Nokia 3310 mobile phone resurrected</a></h3></p>
<p>An iconic Nokia phone is revamped with added battery life and new features alongside several Nokia-branded Android models.</p>
<p>Time: 2017-02-26T16:30:03.000Z</p>
<p>Category: Technology</p>
<p><h3><a href="http://www.bbc.com">US condemns threat to Ukraine monitors</a></h3></p>
<p>Pro-Russian rebels in eastern Ukraine seized a drone at gunpoint from OSCE monitors on Friday.</p>
<p>Time: 2017-02-26T18:23:19.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">I believe in a clean sport - Farah coach</a></h3></p>
<p>The American coach of Olympic champion Mo Farah refutes claims he may have broken anti-doping rules.</p>
<p>Time: 2017-02-26T20:46:32.000Z</p>
<p>Category: BBC Sport</p>
<p><h3><a href="http://www.bbc.com">TV court star Judge Joseph Wapner dies</a></h3></p>
<p>He gained an audience of millions as host of The People&#x27;s Court in the 1980s and early 1990s.</p>
<p>Time: 2017-02-26T22:56:48.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Indian army cancels exams after &#x27;leak&#x27;</a></h3></p>
<p>Candidates are questioned and 18 people are arrested in west India over the release of exam papers.</p>
<p>Time: 2017-02-26T15:16:29.000Z</p>
<p>Category: India</p>
<p><h3><a href="http://www.bbc.com">In pictures: Oscars ceremony 2017</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Speech highlights: Viola, Mahershala and more</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Denzel Washington &#x27;marries&#x27; couple</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Stars step out on the Oscars&#x27; red carpet</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">A secret history of Hollywood style</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Competing for love in the deserts of Chad</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Who killed Kim Jong-nam?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The food waste fighter</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The outcast wives of India</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Why I employ autistic people&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Greeks bridle against &#x27;never-ending&#x27; austerity</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The disappearance of Jonathan Spollen</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Holograms, mistrust and &#x27;fake news&#x27; in France&#x27;s election</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Bombs &#x27;fall like rain&#x27; on Mosul front line</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Olathe victim and widow tell their stories</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How fear and suspicion haunted the last days of Max Spiers</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why do people swear?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Is there a US diplomacy vacuum at the UN in Geneva?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main