Example - BBC News live headlines

HTML output retrieves first article from http://news.bbc.co.uk/. Output is refreshing each 15 minutes from cron.

 

Output:

<p><h3><a href="http://www.bbc.com">Merkel re-elected amid nationalist rise</a></h3></p>
<p>The chancellor is re-elected but nationalists make a historic breakthrough, sparking protests.</p>
<p>Time: 2017-09-25T02:35:59.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">How Germany reacted to result</a></h3></p>
<p>Chancellor Angela Merkel is re-elected and will be joined in parliament by the nationalist AfD.</p>
<p>Time: 2017-09-25T02:01:24.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">US expands travel ban to include N Korea</a></h3></p>
<p>People from Venezuela and Chad will also now face restrictions on travel to the United States.</p>
<p>Time: 2017-09-25T02:20:26.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Iraqi Kurdistan holds independence vote</a></h3></p>
<p>Despite objections from Iraq and other nations, Kurds are pressing ahead with their key referendum.</p>
<p>Time: 2017-09-25T04:45:57.000Z</p>
<p>Category: Middle East</p>
<p><h3><a href="http://www.bbc.com">Tennessee gunman stopped by church usher</a></h3></p>
<p>Police praised the &quot;extraordinarily brave&quot; actions of the usher at the church near Nashville.</p>
<p>Time: 2017-09-24T22:44:50.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Kushner used private email account</a></h3></p>
<p>Trump&#x27;s adviser and son-in-law used a private account, a practice Hillary Clinton was attacked over.</p>
<p>Time: 2017-09-25T00:04:11.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Tourist stops bus at cliff edge</a></h3></p>
<p>The passenger reached the brake moments before the vehicle left the road in the Alps, police say.</p>
<p>Time: 2017-09-24T19:53:43.000Z</p>
<p>Category: Europe</p>
<p><h3><a href="http://www.bbc.com">Thousands evacuated near Bali volcano</a></h3></p>
<p>Indonesian authorities say the mountain&#x27;s seismic energy is increasing and has the potential to erupt.</p>
<p>Time: 2017-09-25T04:56:16.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Fresh Brexit talks to take place</a></h3></p>
<p>Brexit Secretary David Davis will lead UK negotiators into their fourth round of talks.</p>
<p>Time: 2017-09-25T02:26:44.000Z</p>
<p>Category: UK</p>
<p><h3><a href="http://www.bbc.com">Australia to create national space agency</a></h3></p>
<p>The country is one of the world&#x27;s only developed nations not to have a dedicated agency.</p>
<p>Time: 2017-09-25T05:21:59.000Z</p>
<p>Category: Australia</p>
<p><h3><a href="http://www.bbc.com">Defiance after Trump urges NFL boycott</a></h3></p>
<p>Players kneel in protest during the US anthem as the president&#x27;s remarks are strongly condemned.</p>
<p>Time: 2017-09-24T21:27:18.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Australia to create national space agency</a></h3></p>
<p>The country is one of the world&#x27;s only developed nations not to have a dedicated agency.</p>
<p>Time: 2017-09-25T05:21:59.000Z</p>
<p>Category: Australia</p>
<p><h3><a href="http://www.bbc.com">Defiance after Trump urges NFL boycott</a></h3></p>
<p>Players kneel in protest during the US anthem as the president&#x27;s remarks are strongly condemned.</p>
<p>Time: 2017-09-24T21:27:18.000Z</p>
<p>Category: US &amp; Canada</p>
<p><h3><a href="http://www.bbc.com">Uber seeks talks to renew London licence</a></h3></p>
<p>Taxi app firm says it is willing to change, as Tories clash with Labour and unions over Uber&#x27;s future.</p>
<p>Time: 2017-09-24T17:25:36.000Z</p>
<p>Category: Business</p>
<p><h3><a href="http://www.bbc.com">Falling jet wing panel hits car in Japan</a></h3></p>
<p>An airline launches an investigation after the panel falls near Kansai International Airport.</p>
<p>Time: 2017-09-24T14:52:17.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">Sri Lankan caught hiding gold in rectum</a></h3></p>
<p>A customs officer told BBC Sinhala they spotted the man because &quot;he was walking suspiciously&quot;.</p>
<p>Time: 2017-09-25T04:17:16.000Z</p>
<p>Category: Asia</p>
<p><h3><a href="http://www.bbc.com">The fishermen saving Pakistan&#x27;s island dogs</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World News TV</a></h3></p>
<p>The latest global news, sport, weather and documentaries</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">BBC World Service Radio</a></h3></p>
<p>Stories from around the world</p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Living with violence in the DR Congo</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Man clings to train&#x27;s windscreen wiper</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The enduring appeal of Audrey Hepburn</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Whales interrupt surfing competition</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">The man risking his life for an eagle</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Angela Merkel&#x27;s hollow victory</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Can dusty Burning Man bikes help hurricane victims?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How plastic became a victim of its own success</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Are the Rohingya India&#x27;s &#x27;favourite whipping boy&#x27;?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Jean-Michel Basquiat: The neglected genius</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">Jeans giant Levi Strauss gets its mojo back</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">North Korea-US tension: Should you worry?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: </p>
<p><h3><a href="http://www.bbc.com">How flying messes with your mind</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">Five ways science can help you focus</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">A paradisiacal island with a dark past</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Travel</p>
<p><h3><a href="http://www.bbc.com">Surprising photos of Muhammad Ali</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Culture</p>
<p><h3><a href="http://www.bbc.com">BBC Earth nominated for Lovie Awards</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Earth</p>
<p><h3><a href="http://www.bbc.com">What if dinosaurs hadn’t died out?</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Future</p>
<p><h3><a href="http://www.bbc.com">Don’t copy Bill Gates to get rich</a></h3></p>
<p></p>
<p>Time: </p>
<p>Category: BBC Capital</p>
<p><h3><a href="http://www.bbc.com">Moeen hits thrilling ton in England win</a></h3></p>
<p></p>
<p>Time: 2017-09-24T17:40:18.000Z</p>
<p>Category: Cricket</p>
<p><h3><a href="http://www.bbc.com">&#x27;I&#x27;d take pay cut to avoid burning out&#x27;</a></h3></p>
<p></p>
<p>Time: 2017-09-25T05:34:51.000Z</p>
<p>Category: Rugby Union</p>
<p><h3><a href="http://www.bbc.com">Brighton &amp; Hove Albion 1-0 Newcastle United</a></h3></p>
<p></p>
<p>Time: 2017-09-24T18:08:45.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">NFL protests &#x27;unlike anything I&#x27;ve seen&#x27;</a></h3></p>
<p></p>
<p>Time: 2017-09-24T23:00:43.000Z</p>
<p>Category: American Football</p>
<p><h3><a href="http://www.bbc.com">Campbell reveals dad died before fight</a></h3></p>
<p></p>
<p>Time: 2017-09-24T18:12:18.000Z</p>
<p>Category: Boxing</p>
<p><h3><a href="http://www.bbc.com">Garth Crooks&#x27; team of the week</a></h3></p>
<p></p>
<p>Time: 2017-09-24T22:11:21.000Z</p>
<p>Category: Football</p>
<p><h3><a href="http://www.bbc.com">Fifa set to lift ban on the poppy</a></h3></p>
<p></p>
<p>Time: 2017-09-24T21:31:01.000Z</p>
<p>Category: Football</p>

Source code of script:

# File: bbc_main.w
# Name: BBC News live headlines
# Description: HTML output retrieves first article from www.bbcnews.com
# Input: URL [http://news.bbc.co.uk]
# Output format: HTML file
# Output fields: Source URL, Link, Title, Description

#<Logger File>
#	Global
#	FileName bbc_log.log
#	Level debug
#</Logger>

<Section>
    Name bbc_main
	
    Define $output_file bbc_output.html

	# define variable $url and assign it value
    Define $url http://www.bbc.com/news
    
	
	
    # clean output file
    <Action Print>
        FileName {$output_file}
		FileMode Write  
    </Action>
    	
	
    
    # load content
    <Action ContentURL>
        URL {$url}
        RemoveNewLine
        TagsToStrip br,nobr,b
    </Action>

	# the script will iterate through all headlines
	<Section While>
		# search for headlines only in the top part of the website
		EndAt <div class="container">
		
		# match the beginning of headline
		<Pattern>
			RegExp <div class="gs-c-promo-body
		</Pattern>
	
		<Section>
			# stop searching for date before the beginning of next headline
			EndAt <div class="gel-layout__item
	
			# match url
			<Pattern>
				RegExp <a class="gs-c-promo-heading{:re([^"]*)}" href="{$url:re([^"]*)}">
				Trim
				Compact
			</Pattern>
	
			# match title
			<Pattern>
				RegExp <h3 class="gs-c-promo-heading__title{:re([^"]*)}">{$title}</h3></a>
				Trim
				Compact
			</Pattern>
	
			# match summary
			<Pattern>
				Optional
				RegExp <p class="gs-c-promo-summary{:re([^"]*)}">{$summary}</p>
				Trim
				Compact
			</Pattern>
	
			# match time
			<Pattern>
				Optional
				RegExp <time class="gs-o-bullet__text date qa-status-date" datetime="{$time:re([^"]*)}"
				Trim
				Compact
			</Pattern>
	
			# match category
			<Pattern>
				Optional
				RegExp <span aria-hidden="true">{$category}</span>
				Trim
				Compact
			</Pattern>
	
			# and print parsed data
			<Action Print>
				FileName {$output_file}
				Text <p><h3><a href="http://www.bbc.com{$link}">{$title}</a></h3></p>\n<p>{$summary}</p>\n<p>Time: {$time}</p>\n<p>Category: {$category}</p>\n
			</Action>
        </Section>
    </Section>
</Section>

Main bbc_main