Screen scrapers, bots: Where does client data come from?
FINRA issued a warning about third-party data aggregators last spring. The issue: Data aggregation technologies may not function the way they’re intended, putting the consumers, and the professionals who provide financial advice, at risk.
Most of the controversy centers on data sourcing. Some aggregators rely on so-called screen scraping, which refers to automated computer programs that log in to third-party interfaces to collect data, while others rely on APIs and direct feeds. FINRA is especially concerned about screen scraping, yet there’s no consensus on preferred alternatives.
More brokers than ever are choosing to outsource enterprise data aggregation operations making the source of client data a real concern. When you consider a data aggregator, it’s important to find out exactly where the data comes from or you could face potential problems with regulators.
To allow for screen scraping, data aggregators first require consumers to share their login credentials, like usernames and passwords, for banking, credit card, brokerage and other accounts.
From that point, bots use the credentials to log into those accounts at regular intervals, just as a human might. Once inside, the bots record whatever data appears on the screen (transactions, balances, debits, credits, holdings, etc.) and then transfer that data to the aggregator’s databases.
Some aggregators find screen scraping advantageous because it’s comparatively easy to execute. But the disadvantages are numerous, and criticism is mounting.
First, screen scraping creates significant security vulnerabilities at multiple stages and levels. Privacy risk is multiplied each time consumers share their sensitive login credentials with third parties. In addition, aggregators don’t always have the security protections and protocols necessary to prevent a data breach.
Compounding the problem, bank security mechanisms struggle to distinguish authorized screen scraping bots from malicious bots trying to steal sensitive data, which distracts limited security resources. Bots can also collect far more data than is otherwise necessary.
Screen scraping generates serious data quality issues as well. Since bots only collect data at periodic intervals, the data collected is rarely up-to-date. Further, since bots can only collect data that appears on the screen at any given moment, there’s always the potential risk of receiving incomplete or missing data. Companies that rely on screen scraping find that the end data product is unreliable, making it impossible to run accurate performance reports.
Given the serious problems and data quality concerns, leading aggregators and data quality engineers are looking to more efficient and reliable means of data collection. Two prominent alternatives are APIs and direct data feeds.
Added security comes at a cost
An API is a tool that enables consumers to directly sign into an account for the purpose of sharing data with a third party. There are several advantages to this approach. First, direct logins mean consumers don’t need to share credentials with third parties.
Second, third-party aggregators don’t have to deploy bots; APIs facilitate direct data sharing. Third, APIs enable the sourcing party to maintain more control over what data is shared. In short, APIs are comparatively more secure and they don’t bog down websites with an army of bots.
There are two main drawbacks for APIs. The first is that APIs are far more expensive to deploy. They require significant upfront investment and ongoing maintenance costs. In addition, many smaller firms don’t have the scale needed to make them resource-efficient. Secondly, banks and other firms are disinclined to grant third-party data aggregators permission and access. APIs are therefore not currently utilized on a widespread basis.
Direct data feeds
Direct feeds involve an arrangement between the sourcing party and the aggregator to share bulk raw data in real time. This process is typically deployed at the enterprise level, so financial professionals can help clients track investments and portfolio performance.
Because direct feeds share data in raw form, data aggregators can provide much greater flexibility to their clients. Here’s why. Raw data makes it easier for aggregators to execute custom-purpose data cleansing and standardization procedures. Aggregators refine and consolidate data into a single warehouse for easy access tailored to specific client needs.
Direct feeds also deliver greater confidence. There’s never risk of missing or incomplete data because the aggregator can always access the full stock of available data. In addition, direct sourcing is the most secure method of data sharing, with fewer vulnerabilities and no need for consumers to input credentials. Direct data feeds also enable more detailed reconciliations that can identify and fix data issues. Direct feeds are the best approach for aggregating assets under management from various source systems or processing firms used to serve clients.
For held away assets, either direct feeds or APIs would need explicit client permission to share their data. It is critical that consumers control access to their data safely and securely. A mechanism needs to be established whereby they don’t need to share credentials and have them stored with a third party.
When you consider a data aggregator, it’s important to find out exactly where their data comes from. Some aggregators will claim to rely on direct feeds, which may be true enough. But then you might find that they can only access a handful of direct sources. Take care to ask how much of their data comes from screen scraping versus other methods.
Ultimately, data quality is directly related to data sourcing. While both APIs and direct feeds are preferred to screen scraping, direct feeds deliver the highest degree of accuracy, reliability and confidence.