DATA (header)

An estimated 13-min read. I’ve split this into two parts because, as usual, it got unwieldy and I was never going to see a finished version otherwise. Thank you for reading.

An introduction

On Open Brunei, and as individuals, we’ve been dealing with data, collecting it, and making sense of it. The facts project Bruakal involves looking for statistics, sometimes even calculating them ourselves using the figures we find. Faiq’s article Brunei Box Office 2014 involved manually collecting titles of films shown over a year.

In particular, there is a gap that I’m interested in – gaps where we have general questions about Brunei, but the interested individual is unable to get answers – and I would like to see attempts to fill up this gap. I am particularly interested in the kind of questions that can be answered with data.

Data itself is not that always what we need most; we need the information that comes from analysing or processing data. In general, having more Brunei information out there would be ideal.

For example, having case studies:

Finding information on government websites:

Even then, this is an article about sharing and reusing data. While I will be sharing some thoughts about Brunei data, there are some open questions, and I certainly don’t have the answers to everything.

  • In this article, I will use ‘data’ to include data at different stages of processing – raw data, simple tables of totals and percentages, more extensive statistics, and lastly the analysis and knowledge that comes from any of the above forms of data.
  • If you find yourself bristling at any over-general terms in the fields of data science, statistics, or app development… sorry.
  • I also use ‘data’ to refer to non-personal, non-confidential information. When I want you to share your data, I am referring to data of public interest or use. I’m not saying, “Please share your bank account details and your lifetime of personal health data with others.”

The problem with Brunei data

To some extent, ideally, data is compiled and analysed by those who understand it. But data is useful even if you are not an expert in the field or doing formal research. To the individual, the availability of data can help us become more informed; to feel more involved in our society and government; and from there, we might even begin to help others.

The individual could be just a data geek crunching numbers on their computer for fun, but they could also have a use for it. Perhaps she is running a charitable outreach programme and is interested in Belait district’s health and education trends for the past two decades. Or maybe, he is writing a murder mystery that depends on locating local hang-outs, and would be grateful for map data such as FourSquare’s Top Picks near Kuala Belait. Or, for some reason, someone who just wants a map showing jogging paths in Taman Rekreasi Jalan Menteri Besar as on OpenStreetMap.

Those who are doing more formal research – researchers and research students – certainly go through pains when it comes to Brunei data. Some of these might sound familiar to you:

  • If you’re looking at publicly available journals, there aren’t always studies that are specific to Brunei – we may get lumped in with our other neighbours in Borneo, or other Southeast Asian countries
  • Using your Google skills to dig up presentation slides or spreadsheets may bring the risk of dubious sources
  • Local research facilities that are limited or outdated – from school ICT facilities to university library facilities
  • Requests for information from the government may invoke long processes of approval or waiting for meetings with government officers

It’s not a matter of not wanting to do the work, which is sometimes expected when doing research – it’s that jumping through hoops shouldn’t be necessary if some of this data is in the public interest. It should:

  • already be compiled
  • have useful and meaningful detail
  • be available, whether to the public or to researchers

Last year, a simple university assignment required me and my groupmates to look for data about women in power, and we wanted to compare our countries’ data. This was fine for Japan and China, but it turned out I couldn’t get similar data for Brunei – we had to take another tack at the topic.

Considering some worldwide trends, such as the ICT buzzword “big data” and the rise of the field of data science, Brunei might be amiss to not also beef up our own datasets, as well as the skillsets needed to collect, process and analyse them. But of course, there needs to first exist an appreciation of data, including our own Brunei data.

Thoughts & questions:

  • What kind of data is of public interest? Who gets to decide?
  • No local sources would give me information on women in power in Brunei, so I turned to the World Bank for data about Brunei. How many of us depend on external sources for information? How reliable is this information?
  • How many jobs in Brunei make use of skillsets related to analysis and information?

An appreciation of data

Doing away with manual activities

To a person with some IT knowhow, there are some glaring examples that Brunei data isn’t well-structured to be easily shared – or even easily updated – by its own owners.

Some examples, especially in terms of web data:

  • Entertainment websites in Brunei that don’t have RSS feeds or archives. The Mall Cineplex website doesn’t have a RSS feed. At first glance, Times Cineplex does, but it doesn’t contain any movie information (Note: As of Nov 2015). Kristal FM updates their Top 10 charts every week, but don’t appear to keep an archive – which means that the data might be overwritten each week instead of being recorded.
  • News website RSS feeds that are severely limited. The Brunei Times’ only RSS feed shows only a small (5 or less!) number of news stories per day, not even completely matching the website’s front headlines. There are no RSS feeds for other categories of news or content, e.g. business, sports. I once made an RSS feed (via Feed43, since then mysteriously removed) for my friend’s column, finding no other way to get RSS information for it.
  • Statistics in closed formats rather than open ones. Government statistics, if following the open government data movement (more on this below!), should be in spreadsheet or even CSV formats. Some of JPKE’s national statistics, I’m glad to see, are available in MS Excel format, but are still more commonly shared as PDFs. (Note: As of Nov 2015)
  • Government websites that are manually updated. Some still don’t appear to use CMSes (Dewan Bahasa dan Pustaka); some will manually update figures such as number of applications or other types of daily data, instead of having it automated based on databases or feeds. (I can’t list examples of the latter here due to existing contracts. Sorry.)

Is the knowledge of modern data distribution techniques still lacking? Is there a distrust of automated pushing of data? Or in the first place, is there an understanding of why it is beneficial to share data?

Sharing in order to progress

In my personal blog post from 2014, “Criticism and Continuity“, I talked about the idea of “progress” – which requires us to look at past mistakes, acknowledge them, and learn from them. We should be able to use the past and make informed decisions when working towards the future. This isn’t always required, but I am concerned that we don’t do it enough.

Even if you’re starting anew, others may have done some relevant work before. Can we transfer the lessons learnt? To be clear, it’s not about making zero mistakes – but to be better informed. How many times can we talk about unemployment and bring up anecdotes, without considering JPKE‘s unemployment or labour force data, or even that CSPS conducted a study in 2009.

Data collection and analysis also overlaps between different parties, but often isn’t shared. I was amazed to find, as an officer in EGNC, that there had been an e-government study done in 2009, which highlighted issues that in 2013 were still major problems. Even outside of the gap between academia and professionals, there were other instances where information and experiences just weren’t shared – sometimes within a department, sometimes between Ministries. I can’t help but see it as a waste of effort.

In the workplace, transfer of knowledge happens through training, instruction manuals, seminars, handovers – all manners of passing on knowledge. Modern ways may include wikis, shared online storage, online collaborative spaces. It’s all well and good, but the attitude needs to be there too. There is an unwillingness to share prior work, to acknowledge past mistakes. If we’re truly interested in development as a whole – and not our own individual benefit or pride – then we must be willing to share our mistakes.

Let’s be more open to the efforts of those before us, and consider the research that has been done:

  • data that has been collected
  • opinions that have been gathered
  • conclusions that have been made

And should we be in the position to consider sharing our own efforts, we might think about how much to share, and the “openness” of our data.

Thoughts & questions:

  • What are some examples of data that should be shared amongst different parties?
  • There are also valid reasons for different parties to not share data – confidentiality, context – what do these entail?

Considering open data

Visualizing NYC’s Open Data by noneck, on Flickr
"Visualizing NYC’s Open Data"  by  noneck  CC BY 2.0

Since we’re talking about sharing data, the concept of open data comes to mind.

Open data has some specific qualities and principles, which I think would help to overcome some of the current barriers of working with Brunei data:

  • open format – Data is shared in a convenient and modifiable form. A standardised form allows for ease of processing, replication and sharing.
  • re-use, redistribution, universal participation – Data can be used by anyone – for example, not restricted by usage for commercial, non-commercial or educational contexts – without requiring official approval, and is encouraged to be remixed with other datasets.

(More definitions for open data: Open Definition, Open Data Handbook)

Open data encourages sharing; it is meant to allow people (and machines and organisations) to use, share and distribute data more easily to each other:

‘If you’re wondering why it is so important to be clear about what open means and why this definition is used, there’s a simple answer: interoperability.’
‘The core of a “commons” of data (or code) is that one piece of “open” material contained therein can be freely intermixed with other “open” material.’
What is Open Data?

Government data can be “open” too; open government data is non-confidential and non-proprietary. The Brunei government’s open data portal Data.gov.bn – launched in December 2014 – aims to be Brunei’s repository of open government data (Borneo Bulletin, 2014; Brunei Times, 2014).

I imagine there’d be some hesitation for government agencies to release their data – the Borneo Bulletin article above noted “challenges” in “encouraging these agencies to add to the cause and towards enriching the site’s contents”. The principles of re-use and redistribution and universal participation might make some officers or Heads pause; it would be good for them to first understand the consequences of these.

Some additional information on ‘open government data’:

Open government data is non-confidential, but there’s a little more than that – what type of data is it, how is it distributed, and how is it expected to be used?

Read more:

Apps that make use of government data to deliver useful information:

1 – Mobile@HDB as seen on data.gov.sg App Showcase – A mobile app made using data from Singapore’s Housing and Development Board.

2 – CrashMap – A web app that shows road crashes occurring in Coventry on a map, using data that comes from official incidents reported to the UK police.

But that said, we do not need to start with government data. Open data can include cultural, science, finance, statistics, weather, environment, transport data (Open Knowledge) – data of public interest. Not all of it has to be compiled by the government; data could be provided by private organisations or by individuals. Here are some ideas:

  • map routes of public transport, with geolocation
  • pollution data, with geolocation
  • archive of “Top 10” music charts from local radio stations
  • locations of playgrounds or child-friendly areas
  • data that is crowd-sourced

Yes, some of the above has been done! I hope to see continued efforts to collect and share such data for the use of others 🙂

Data beyond anecdotes and tables of percentages

What’s wrong with anecdotes?

Anecdotes – especially if shared by a persuasive speaker or writer – are fine, but there comes a point where we need to do more than rely on just anecdotes. This is not to diminish a person’s individual experience, but to look at the scale at which other persons are having the same experience. We can consider an anecdote as the starting point of a research question.

Evidence can be stronger when there are higher numbers proving it. “Dissatisfaction with a specific company”, for example, becomes a stronger claim when we can see it backed up by survey responses or recorded complaints; “road traffic is getting heavier” could be supported by congestion reports and increasing trends in petrol purchase.

Relying only on anecdotes may also mean missing out on the standardisation in data collection. Anecdotes are also prone to confirmation bias.

I have to thank my B:Read co-founder Teah for introducing me to this term! In one of our first conversations about the reading culture in Brunei, we both had different answers to question “Do Bruneians read books?” – which is of course affected by our different schools and groups of friends.

What it is:

You may have experienced an issue a certain way. And yet, it is completely possible that this experience is unique to you – it may only be true for you.

Even if your group of friends all agree with you, it is still not enough. Your group of friends might have similar worldviews due to any other combination of similarities – backgrounds, education level, lifestyles – even something as simple as what newspaper you read – these may affect the way you view the world.

For example, while I’m sure this is was true and that some private employers do not treat their employees well, it’s still an anecdote:

Identifying it as anecdotal evidence does not mean the experience should be dismissed, or that empathy – or in this case compassion – can’t be directed to the person who was affected.

To further demonstrate with an anecdote – I’m aware of the irony – I would quote my friend “Javascript Guy” (hint: one of these guys): “Data is more important than gut feelings.” He had looked at his website analytics and found, counter to his instincts, that a high number of website visitors were using the web browser Opera Mini. He had believed that most Android users would be using either Android’s stock browser, Chrome, or Firefox. I then remembered that my old HTC phone had come bundled with Opera. Turns out it’s not unreasonable to find Opera users in the region – in 2014, Indonesia was Opera’s second largest userbase for Opera Mini.

I particularly like the article “Evernote, The First Dead Unicorn” which uses a number of ways to prove its point – app store rankings, employee reviews – and is aware of not relying too much on anecdotal evidence, while also being aware of shortcomings of data given as evidence:

Download numbers are of course just one part of the story. If unicorns die from slow declines in relevance, a more interesting marker is review quantity. Review star counts are not a great way of measuring the success of the company producing the app; Facebook’s reviews have been terrible since late 2014, when the company pulled Messenger functionality from its main application. Still, they are a great measure of how many people really care about your product, as both sides of the user experience (both people who love the product, and people who love to hate it) tend to drive review engagement.

Basic statistics aren’t enough either

Have a look at the two following tables from Data.gov.bn:

Table with little detail from Data.gov.bn - Statistik Jumlah Anak Yatim Berdaftar 2010 hingga 2014

JAPEM’s statistics of registered anak yatim on Data.gov.bn. Link to dataset

 

Detailed table from Data.gov.bn - Boat License issue in 2014

MOC’s statistics of registered boats on Data.gov.bn. Link to dataset

 
The first table, JAPEM’s statistics of registered anak yatim, is superficial and gives no opportunities for further insights. Possible pieces of data that would have been interesting: Breakdown by district, gender, age category? Or even family’s socioeconomic status, status of assistance received, and so on. Again, none of these have to be confidential or identifiable information.

The second table, Marine Department’s statistics of registered boats, is more detailed, giving breakdown by boat type and month. It is more interesting because it can immediately open up new questions. Why were no fishing boats licensed in May? How do the 2015 numbers compare for tourist boats? What comes under the category “pleasure boats”? These questions may not be answered by the same dataset, but allows you to consider further research.

Detailed tables can be richer in information, but even more so if:

  • You have enough of it – a critical mass of data – that can reliably show data trends. Monthly statistics for a year may be interesting, but there is more significance in a decade’s worth of monthly statistics.
  • Comparing multiple datasets from different sources can show new insights or lead to new applications – data mashups are a thing!
Thoughts & questions:

  • Are there any good examples of Brunei data trends that are only possible from richer, detailed data?
  • What are some possible data mashups that can be done with Brunei data?

End of Part I

More information:

For more information about open data:

Resources for data wrangling:

  • The Open Source Data Science Masters – Open-source resources giving an introduction to data science.
  • How to work with web data? If you’re interested in the technical bits while meeting other Brunei residents, try hooking up with Brunei Geek Meet. 🙂
  • The Programming Historian – Resources for those who may have more of a ‘Humanities’ background wanting to get into digital tools and techniques for their research (thanks The Wheat for this link!)
Where to get Brunei data:

Hazirah I’ve become more interested in data in the past few years: I scraped, processed and analysed data during my Masters; I’ve discovered I have been somewhat of a half-hearted ‘quantified selfer‘ all my life; and a couple of my prior projects centred around data (Purple Bus Watch; #brunei complaints). My “data cred(ibility)” is mostly casual with a brief academic venture, but I like to think that being in the ICT field and a web enthusiast has given me some insights.

Categories: Technology

Comments

  1. Hazirah Article Author

    I somehow managed to write this article without mentioning ethnography, which is many MANY steps above ‘anecdotes’ and could be SO interesting if some are done in a Brunei context. (Question for those in the know: To what extent is ethnographic research used in local institutions?)

    Links:
    Ethnography – Wikipedia, the free encyclopedia
    Big Data Needs Thick Data | Ethnography Matters
    Lists of good ethnographies on antropologi.info

  2. Shai

    I have with me various data sets on Brunei which I have compiled in the past, but I eventually abandoned their analyses. To me the biggest hindrance to publishing findings is the ethical approval that is required to ensure integrity and no harm falling on subjects or agencies. Apart from CSPS and the inhouse research departments, I don’t know of any other governing entity who vets independent research. I’m not a fan of social experiments as they are oftentimes haphazard and a snapshot of the observed phenomenon. I would like to contribute to data if this matter has clarity.

    1. Hazirah Article Author

      Thanks for your reply.

      That is a shame about your collected datasets. Did you encounter any specific challenges in getting ethical approvals? Is it related to storing sensitive data, or getting participant consent, or any other types of issues? Also, what about data that does not contain sensitive information or can be sufficiently anonymised?

      Regarding independent research, you’re right that it isn’t necessarily vetted by authorities*, although I don’t believe that automatically means they are not valid studies. I feel that, if the data and findings cannot be published, whether it can at least be shared that the study has been done. So those interested to conduct future related studies may then be able to contact the original researcher, and be aware of issues and challenges related to their topic.

      * Cynically I might add, this also does not mean that all research vetted by authorities is necessarily good research.

Leave a Comment