Text analysis on nature and wellbeing study

In 2018, myself and colleagues at the Data Science Research Centre, University of Derby (UoD), worked with a multidisciplinary team of researchers (Psychologists from Human Science department UoD and Geoscientists from University of Sheffield) on a large-scale social Internet of Things project. The study, Improving Wellbeing through Urban Nature (IWUN), aimed to investigate how citizens’ interaction with green spaces influence their wellbeing and recommend intervention for city planners. Our main task was to mine the mix of objective (GPS tracked location points, sensor data about speed) and subjective data (text and images) to uncover patterns in interaction, discover interesting features, find correlations and complement missing/insufficient data. Several difficulties were encountered tied to the scale of the study, messiness of collected data, and the intricacy in interpreting data that integrates several domain inputs. My contribution was mostly in the text analysis and correlation with the image observations. 

Task at hand

Participants of the study have used a smartphone app which prompts them to observe and record the good things about nature whenever they enter one of the 760 digitally geofenced areas. The outcome is data containing textual and photographic information about what they noticed about nature. The study involved 1870 subjects, with 6 million tracked location points and 5600 textual observations.

First steps: clustering, topical modelling

The first step was an exploratory analysis of the text to understand the hot topics in the observation. I applied K-means to cluster text based on their measure of similarity. Next, I performed topic modelling to extract insight into how different categories of people interact with green areas. The approach used was Latent Dirichlet Allocation (LDA) to discover the most relevant terms by age and gender. The extracted features indicate that age and gender could have a differentiation effect, showing that different categories are interested in specific features. Therefore, interventions need to target specific categories.

Words identified as most important from the text analysis

Next: Text classification - Fasttext

To better understand the topics of interest I used the Fasttext library to train a classifier with labelled data from some psychologists’ study on human connectedness to nature. The data from their study was hand-coded into 11 themes (labels) using content analysis, a systematic technique used to code large volumes of data. Using the trained classifier, the model outputs the most likely labels for our observation data. The  “specific aspect of nature” theme appears to dominate regardless of the threshold we set. I found that the hot topics were also captured in the images analysed using Google Cloud Vision API. 

Final Thoughts

Our text data can be likened to tweet-like document since they are short. As such, some techniques for processing large documents could not perform well on it. For instance, the LDA does not perform well with short documents which are roughly centred on the same topic (nature). Semantic analysis does not work with our dataset because the participants had been asked to notice the ‘positive’ things about their environment. So, nearly all the texts had positive sentiment except for very few outliers. Clustering happens to be a good starting point for exploratory study and could be fine-tuned using predefined keywords in guided clustering.

Publications:

Ferrara, E., Liotta, A., Erhan, L., Ndubuaku, M., Giusto, D., Richardson, M., Sheffield, D. and McEwan, K., 2018, August. A pilot study mapping citizens’ interaction with urban nature. In 16th Intl Conf on Pervasive Intelligence and Computing, (pp. 836-841). IEEE. Link

 Ferrara, E., Liotta, A., Ndubuaku, M., Erhan, L., Giusto, D., Richardson, M., Sheffield, D. and McEwan, K., 2018, September. A Demographic Analysis of Urban Nature Utilization. In 2018 10th Computer Science and Electronic Engineering (CEEC) (pp. 136-141). IEEE. Link

Erhan, L., Ndubuaku, M., Ferrara, E., Richardson, M., Sheffield, D., Ferguson, F.J., Brindley, P. and Liotta, A., 2019. Analyzing Objective and Subjective Data in Social Sciences: Implications for Smart Cities. IEEE Access, 7, pp.19890-19906. Link

2 thoughts on “Text analysis on nature and wellbeing study”

    1. Maryleen Ndubuaku

      Thanks very much, Ifex. I appreciate that you took your time to read and comment.

Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart
Scroll to Top