Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Scores killed in Taliban attack on base</a></h3></p>
<p>At least 43 security personnel are now known to have died in Monday&#x27;s devastating attack on an intelligence base.</p>
<p>Time: 2019-01-22T10:11:14.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Scores killed in Taliban attack on base</a></h3></p>
<p>At least 43 security personnel are now known to have died in Monday&#x27;s devastating attack on an intelligence base.</p>
<p>Time: 2019-01-22T10:11:14.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Zimbabwe leader abandons trip amid unrest</a></h3></p>
<p>Emmerson Mnangagwa returns home instead of attending the Davos economic summit in Switzerland.</p>
<p>Time: 2019-01-22T09:54:39.000Z</p>
<p>Category: Africa</p>
<p><h3><a href="http://www.bbc.com">Ronaldo accepts €18.8m deal over tax evasion</a></h3></p>
<p>The football superstar has signed a deal with Spanish prosecutors that will spare him jail time.</p>
<p>Time: 2019-01-22T11:13:39.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Oscar nominations: What to expect</a></h3></p>
<p>Who and what could be the main talking points?</p>
<p>Time: 2019-01-22T10:57:36.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">&#x27;Excellent alpha lady&#x27; targets US presidency</a></h3></p>
<p>US students describe presidential hopeful Senator Kamala Harris in three words.</p>
<p>Time: 2019-01-22T01:13:40.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Bebe Rexha told size 8 is &#x27;too big&#x27;</a></h3></p>
<p>The pop star claims several designers have refused to dress her for this year&#x27;s Grammy Awards.</p>
<p>Time: 2019-01-22T10:00:37.000Z</p>
<p>Category: Entertainment &amp; Arts</p>
<p><h3><a href="http://www.bbc.com">UK MPs put forward rival Brexit plans</a></h3></p>
<p>PM Theresa May is to meet the cabinet as Labour calls for a vote on options including another referendum.</p>
<p>Time: 2019-01-22T10:57:35.000Z</p>
<p>Category: UK Politics</p>
<p><h3><a href="http://www.bbc.com">US spy suspect to stay in Russian custody</a></h3></p>
<p>Ex-US marine Paul Whelan was arrested last month on suspicion of spying - claims his family denies.</p>
<p>Time: 2019-01-22T10:31:27.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">France angered by Italy&#x27;s Africa remarks</a></h3></p>
<p>The Italian ambassador is summoned after Deputy PM Luigi di Maio accuses France of exploiting Africa.</p>
<p>Time: 2019-01-22T10:12:24.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Sex abuse scandal hits S Korea&#x27;s elite skaters</a></h3></p>
<p>South Korea is a world leader in speed skating and the latest allegations have shocked the public.</p>
<p>Time: 2019-01-22T08:02:51.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Ex-Nissan boss Ghosn&#x27;s bail request denied</a></h3></p>
<p>The decision means Carlos Ghosn could remain in custody in Tokyo until his trial for financial crimes.</p>
<p>Time: 2019-01-22T05:23:22.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Sex abuse scandal hits S Korea&#x27;s elite skaters</a></h3></p>
<p>South Korea is a world leader in speed skating and the latest allegations have shocked the public.</p>
<p>Time: 2019-01-22T08:02:51.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Ex-Nissan boss Ghosn&#x27;s bail request denied</a></h3></p>
<p>The decision means Carlos Ghosn could remain in custody in Tokyo until his trial for financial crimes.</p>
<p>Time: 2019-01-22T05:23:22.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Key Antarctic food source moves south</a></h3></p>
<p>A warming climate shifts the distribution of the krill species eaten by whales and other predators.</p>
<p>Time: 2019-01-21T22:58:45.000Z</p>
<p>Category: Science &amp; Environment</p>
<p><h3><a href="http://www.bbc.com">Taiwan &#x27;bikini hiker&#x27; dies on solo climb</a></h3></p>
<p>Condolences pour in for Gigi Wu, whose hiking expertise and candid photos won her thousands of fans.</p>
<p>Time: 2019-01-22T09:48:36.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Star footballer feared on missing plane</a></h3></p>
<p>It is feared Argentine striker Emiliano Sala was on a light aircraft which disappeared near Guernsey.</p>
<p>Time: 2019-01-22T10:59:51.000Z</p>
<p>Category: Guernsey</p>
<p><h3><a href="http://www.bbc.com">Why I played on through brain surgery</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Trainload of vintage tanks wows Russians</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Australia&#x27;s culinary queen&#x27; whips up her next venture</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">In pictures: &#x27;Super blood wolf moon&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Toddler imitates dad&#x27;s arrest in viral video</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Depp and Heard up for worst acting awards</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Migrant caravan: &#x27;I left without telling my mum&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Year of the Pig? Peppa takes China by storm</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The battle on the frontline of climate change</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Turtle meat - the ultimate survival diet?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Why your new heart could be made in space one day</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Davos: &#x27;I’m the boss, he’s the spouse&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;The bed that saved me from the Taliban&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;The bed that saved me from the Taliban&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">&#x27;Why I wanted a tattoo on my mastectomy scar&#x27;</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How will history judge President Trump?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Daring escape from sexcam captivity</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The appalling cost of domestic abuse</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">What we know about gut health</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">A solution for millennial burnout?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">A rare treat to mark the New Year</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">Startling tales of a medieval superhero</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">The puzzle of ancient brain surgery</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">Can wildlife return to a city?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">Should you ‘ghost’ your employer?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">&#x27;Genuine concern&#x27; Cardiff signing Sala was on missing flight</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Nadal wins first set against Tiafoe - text &amp; radio commentary</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: Tennis</p>
<p><h3><a href="http://www.bbc.com">Kvitova books Australian Open semi-final against Collins</a></h3></p>
<p></p>
<p>Time: 2019-01-22T09:53:28.000Z</p>
<p>Category: Tennis</p>
<p><h3><a href="http://www.bbc.com">India&#x27;s Kohli wins three top ICC awards</a></h3></p>
<p></p>
<p>Time: 2019-01-22T07:27:21.000Z</p>
<p>Category: Cricket</p>
<p><h3><a href="http://www.bbc.com">Tsitsipas becomes youngest male Grand Slam semi-finalist since 2007</a></h3></p>
<p></p>
<p>Time: 2019-01-22T06:28:46.000Z</p>
<p>Category: Tennis</p>
<p><h3><a href="http://www.bbc.com">Ex-Spurs &amp; Portsmouth forward Boateng makes surprise loan move to Barcelona</a></h3></p>
<p></p>
<p>Time: 2019-01-22T09:14:29.000Z</p>
<p>Category: European Football</p>
<p><h3><a href="http://www.bbc.com">Not yet Serena! Williams retreats after walk-out mix-up</a></h3></p>
<p></p>
<p>Time: 2019-01-21T17:29:26.000Z</p>
<p>Category: Tennis</p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main