• Saturday, April 20, 2024
businessday logo

BusinessDay

How top health websites are sharing sensitive data with advertisers

businessday-icon

Some of the UK’s most popular health websites are sharing people’s sensitive data — including medical symptoms, diagnoses, drug names and menstrual and fertility information — with dozens of companies around the world, ranging from ad-targeting giants such as Google, Amazon, Facebook and Oracle, to lesser-known data-brokers and adtech firms like Scorecard and OpenX.

Using open-source tools to analyse 100 health websites, which include WebMD, Healthline, Babycentre and Bupa, an FT investigation found that 79 per cent of the sites dropped “cookies” — little bits of code that, when embedded in your browser, allow third-party companies to track individuals around the internet. This was done without the consent that is a legal requirement in the UK.

Google’s advertising arm DoubleClick was by far the most common destination for data, showing up on 78 per cent of the sites tested, followed by Amazon, which was present in 48 per cent of cases, Facebook, Microsoft and adtech firm AppNexus.

“These findings are quite remarkable, and very concerning,” said Wolfie Christl, a technologist and researcher who has been investigating the adtech industry. “From my perspective, this kind of data is clearly sensitive, has special protections under the [General Data Protection Regulation] and transmitting this data most likely violates the law.”
Health for sale

For centuries, physicians have sworn the Hippocratic oath, to keep secret “whatever I see or hear in the lives of my patients”.

Read also: Can Facebook really rely on artificial intelligence to spot abuse?

But hundreds of millions of people now turn to the web each day to allay their medical worries, which range from the mundane to the grave. Despite the illusion of privacy that exists between users and their computers, the reality is starkly different.
A network diagram showing the flow of user data from the health website BabCenter.com on 7th November 2019. There are a plethora of connections including to Big Tech advertising (Facebook, Google), Programmatic advertising and contextual advertising

Digging deeper into 10 of the sites, chosen to reflect the different types of health information they offer to users, the FT looked at the types of data they were sharing.

The investigation excluded data sent to analytics companies to improve the performance of a website, and consent was given for cookies on all the websites that requested it. The privacy policies the FT reporters consented to did not adequately outline that this sensitive data would be shared with third parties, however, or for what purposes.

The data shared included:

drug names entered into Drugs.com were sent to Google’s ad unit DoubleClick.
symptoms inputted into WebMD’s symptom checker, and diagnoses received, including “drug overdose”, were shared with Facebook.
menstrual and ovulation cycle information from BabyCentre ended up with Amazon Marketing, among others.
keywords such as “heart disease” and “considering abortion” were shared from sites like the British Heart Foundation, Bupa and Healthline to companies including Scorecard Research and Blue Kai (owned by software giant Oracle).

In eight cases (with the exception of Healthline and Mind), a specific identifier linked to the web browser was also transmitted — potentially allowing the information to be tied to an individual — and tracker cookies were dropped before consent was given. Healthline confirmed that it also shared unique identifiers with third parties.
‘Data silos of undesirables’

Since the adoption of the Europe-wide General Data Protection Regulation in May 2018, the EU online advertising industry, which makes $55bn of annual sales, has been subject to tighter rules around the collection and processing of data.

It is now illegal for advertisers to share the most sensitive data, including on health and sexual orientation, without explicit consent, where the user agrees to the specific sharing of their “special category” data, and is told how it will be used and by whom.

None of the websites tested asked for this type of explicit and detailed consent.

The ultimate destinations of the personal and sensitive data collected and shared by the websites was opaque, as it was not visible via an internet browser.

Research into the “data broker” industry shows that dozens of companies profit from buying and selling data to multiple clients who want to better understand users.

Experts believe that the predictive models built by the plethora of advertising and data-targeting companies may use ill health to profile and prey on users.

Knowledge of an individual’s medical ailments allows companies to try to sell specific treatments, services or financial products that desperate users might turn to.

“There is a whole system that will seek to take advantage of you because you’re in a compromised state. I find that morally repugnant,” said Tim Libert, a computer scientist at Carnegie Mellon university, who built the open source WebXray tool used by the FT, and specialises in the social and legal implications of online ad tracking.

Previous research in which Mr Libert analysed 80,000 unique pages relating to common diseases found that more than 91 per cent contacted third parties in the US. The paper explains that holding such sensitive data on a person can result in discriminatory marketing, even without marketers knowing their identity.

“As medical expenses leave many with less to spend on luxuries, these users may be segregated into ‘data silos’ of undesirables who are then excluded from favourable offers and prices,” Mr Libert wrote. “This forms a subtle, but real, form of discrimination against those perceived to be ill.”

In the UK, the online advertising industry was put on watch in June by the regulator, the Information Commissioner’s Office. It gave the industry until December to clean up its data practices, or face further probes.

“This investigation by the Financial Times further highlights the ICO’s concerns about the processing of special category data in online advertising, as well as the role that site owners and publishers play in this ecosystem,” said Simon McDougall, the ICO’s executive director for technology policy and innovation.

“Special category data — such as health information — requires greater protection because of its sensitivity and the increased risk of harm to or discrimination against individuals. We will be assessing the information provided by the FT before considering our next steps,” he added.
The advertisers’ defence

Google, which powers the online advertising industry, said that it “does not build advertising profiles from sensitive data . . . and has strict policies preventing advertisers from using such data to target ads”.

It told the FT that the named sites investigated had been marked as “sensitive” internally, meaning the information that we found being sent to them was specifically excluded from the database used for personalised advertising. It said that its technology might be used to serve “contextual” ads, based on the contents of the page, but not user information.

The company explained that if a publisher chose to include information like the date of its visitor’s last period in the URL, it could be sent to Google as part of an ad request from that page. But Google’s ads systems would not understand what that URL data represents, nor use it to create profiles of users.

The sensitive data could be used for a variety of other reasons, including protecting against fraud and abuse and measuring the engagement with an advert, Google said.

Facebook, another frequent tracker across the sites we surveyed, which also received data on highly sensitive symptoms and diagnoses, was not able to confirm what it does with this information. “We don’t want websites sharing people’s personal health information with us — it’s a violation of our rules, and we enforce against sites we find doing this,” a company spokesperson said. “We’re conducting an investigation and will take action against those sites in violation of our terms.”

Read also: ‘Investors have become more conscious of the state of their financial health – Irabor

Amazon said: “We do not use the information from publisher websites to inform advertising audience segments,” but it did not confirm what it did with the sensitive data is received, such as user-input fertility information.

It was unclear if either Facebook or Amazon also received personal identifiers, such as an IP address or a unique ID, alongside health data.

The companies also emphasised that the publishers of the websites were required to manage user consent and the type of data sent to third parties.

“It’s like a bar saying we don’t like to serve people that are underage, they shouldn’t come here to drink,” Mr Libert said. “They are being negligent and it’s deeply disingenuous.”

Meanwhile the website publishers themselves did not provide details of why the data was being shared or what would be done with it once it left their hands. A WebMD spokesperson said: “[W]e only use, collect or share user information to the extent disclosed in our privacy policy.” The policy reviewed by the FT did not appear to provide clear answers about the fate of the data.
Bar chart of Number of the UK’s top 100 health websites that contacted advertisers showing Google was the biggest recipient of data from top health websites

Condensed comments from other companies that responded are published at the end of this article. Others contacted, including Lotame, ComScore, AppNexus, Drugs.com, Health.com and Bounty, did not respond to request for comment.

As the ICO’s deadline for online ad auction firms to audit themselves approaches, it will be a time of reckoning for many in the industry that was until recently self-regulated.

“The internet has turned into a privacy wasteland. But there’s a suspension of disbelief in the [ad] industry. Companies say they are GDPR-compliant, there’s a codependency where everybody pretends everything is OK, but the deep technical architecture is fundamentally incompatible with the right to privacy,” Mr Libert said.

“Ultimately it’s going to be the ICO that decides, and based on early guidance, I suspect they may not be a willing participant in this fictional world built by online advertisers.”