Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">UK PM signs letter that will trigger Brexit</a></h3></p>
<p>The letter will be delivered to the EU on Wednesday, marking formal notice of the UK&#x27;s exit.</p>
<p>Time: 2017-03-29T08:52:18.000Z</p>
<p>Category: UK Politics</p>
<p><h3><a href="http://www.bbc.com">Trump scraps Obama climate policies</a></h3></p>
<p>Environmentalists warn Mr Trump&#x27;s order will have serious consequences at home and abroad.</p>
<p>Time: 2017-03-29T00:19:17.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Toshiba nuclear unit in bankruptcy</a></h3></p>
<p>Westinghouse has struggled with losses that have thrown its Japanese parent into a major crisis.</p>
<p>Time: 2017-03-29T08:51:37.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Indonesian man found dead inside snake</a></h3></p>
<p>The bulging python was cut open after villagers grew suspicious it had eaten the man.</p>
<p>Time: 2017-03-29T07:20:30.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">French ex-PM backs Macron for president</a></h3></p>
<p>Instead of backing his Socialist party&#x27;s candidate, Manuel Valls says he will support a centrist.</p>
<p>Time: 2017-03-29T08:59:12.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">&#x27;EU wants a painful Brexit &#x27; - Marine Le Pen</a></h3></p>
<p>Though the French presidential candidate still thinks Great Britain will negotiate a good deal.</p>
<p>Time: 2017-03-29T00:18:40.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Australia cyclone prompts flood alert</a></h3></p>
<p>A 1,300km (800 miles) area is at dangerous risk of flooding after Cyclone Debbie, authorities warn.</p>
<p>Time: 2017-03-29T02:20:22.000Z</p>
<p>Category: Australia</p>
<p><h3><a href="http://www.bbc.com">Anger as US internet privacy law scrapped</a></h3></p>
<p>The House repeals rules requiring broadband providers to get permission before selling your web history.</p>
<p>Time: 2017-03-29T00:00:01.000Z</p>
<p>Category: Technology</p>
<p><h3><a href="http://www.bbc.com">Galaxy S8: Samsung&#x27;s most important yet?</a></h3></p>
<p>After the Note 7 fiasco, Samsung&#x27;s latest launch could be a make or break moment.</p>
<p>Time: 2017-03-29T05:01:52.000Z</p>
<p>Category: Technology</p>
<p><h3><a href="http://www.bbc.com">Zuma asked to miss Kathrada funeral</a></h3></p>
<p>Family of anti-apartheid activist requests South Africa&#x27;s president not to attend his funeral.</p>
<p>Time: 2017-03-29T08:51:10.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">Cambodia bans export of human breast milk</a></h3></p>
<p>In a controversial practice, a company was processing and selling the milk in the US.</p>
<p>Time: 2017-03-29T05:59:32.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Zuma asked to miss Kathrada funeral</a></h3></p>
<p>Family of anti-apartheid activist requests South Africa&#x27;s president not to attend his funeral.</p>
<p>Time: 2017-03-29T08:51:10.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">Cambodia bans export of human breast milk</a></h3></p>
<p>In a controversial practice, a company was processing and selling the milk in the US.</p>
<p>Time: 2017-03-29T05:59:32.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Arrests after UK deportation protest</a></h3></p>
<p>Flights were temporarily halted on Tuesday evening after protesters locked themselves in an aircraft.</p>
<p>Time: 2017-03-29T04:53:36.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">Survivor&#x27;s family &#x27;overwhelmed by love&#x27;</a></h3></p>
<p>Andreea Cristea remains critically ill, as events will mark one week on from the London attacks.</p>
<p>Time: 2017-03-29T08:23:50.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">US hockey women halt boycott over pay</a></h3></p>
<p>The US women&#x27;s ice hockey team settle a long-running equal pay dispute and call off their boycott of the forthcoming US-hosted world championships.</p>
<p>Time: 2017-03-29T07:33:26.000Z</p>
<p>Category: BBC Sport</p>
<p><h3><a href="http://www.bbc.com">Reality Check: Article 50 triggered</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Article 50: What happens now?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">What are Brexit Britain&#x27;s trade options?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Brexit: All you need to know</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Spare a thought for Theresa May...</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Memories for Mummy&#x27; - Rio Ferdinand helps children grieve</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Can Trump save the coal industry?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Do you share a bed with your pet?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why I&#x27;ve knitted hundreds of woollen breasts</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Small shark rescued from Australian pool</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Was Delhi attack racially motivated?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">British expats fear for their rights post-Brexit</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The Hindu hardliner running India&#x27;s most populous state</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Japan turns to Basil Fawlty for Olympic English</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The ambulance for sex work</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">10 areas that will shape the Brexit talks</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Is Trump actually good for Nato?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why do Canadians live longer than Americans?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main