A Day in the Life of a Human Data Scientist
IQVIA数据科学和高级分析高级副总裁Yilian Yuan
Feb 04, 2019



Virtually no one can walk in off the street and serve as a human data scientist. Although we approach analytics in much the same way as data scientists in other industries, there’s so much more to it. The competencies we require in a human data scientist team form a rather daunting list, including:

  • 对现实世界数据(RWD)的深入了解。RWD最值得注意的是,鉴定出患者级的纵向数据,以及来自其他各种来源来处理保险索赔,管理患者护理并促进健康和健身的数据。必威手机APP如果没有非常深刻的了解这些数据库中的内容,缺少的内容以及如何连接和应用它们,使用数据的人可以轻松地误解它,得出错误的结论并提供不当的建议。
  • 生命科学行业的治疗动态和商业实践的领域知识。使用患者级别的数据需要对给定的治疗领域有全面的了解。至少,必须了解医师的做法,典型的患者旅行如何进行,治疗指南是什么以及疗法的竞争。同时,您必须知道品牌团队的运作方式,以便为他们提供可行的建议。例如,他们如何使用销售分析,营销分析,商业分析和业务分析来做出决策?
  • 具有新的高级分析技术的经验。Patient-level data is big data and very complex. Traditional analytical tools and statistical methods often can’t handle them. Here, data demands machine learning (ML) and artificial intelligence (AI) to uncover the insights within the rich data with the volume and complexity. So, familiarity with Python, TensorFlow and other software tools for Artificial Intelligence Modeling Language (AIML) are required.
  • 利用最新的计算技术产生影响。必威官方在线上面提到的分析技术,尤其是应用于大数据,非常计算机密集型,需要更强大的软件和硬件。他们需要最新技术(例如SPARK,SCALA和HADOOP)来处理数据并执行分析步骤和算法。为了支持医疗保健方面的重大进展,我们必须将AI/ML模型嵌入公司的工作必威手机APP流或医疗保健决策过程中。这些模型必须适应我们客户运营的多种疾病领域或市场情况。我们监视最新技术,以便我们可以采用它们来处理必威官方在线大数据并产生重大影响。

There are certainly data scientists in other organizations in other business sectors (such as financial services and retail for example) who are adept at using the latest analytical techniques and sophisticated computing technologies with other forms of big data. But they may not have the in-depth knowledge of the healthcare data and know-how to integrate various data assets and to make sense of it. This is probably why, when life sciences organizations attempt to build their own analytics team to do the type of customized analytics that we perform, they find that it is not that easy. Often, they realize that they need to partner with our human data scientists and, of course, the whole ecosystem of support we have around us at IQVIA.


Breaking New Ground

It may surprise people to learn that the technical aspects of being a human data scientist are, for us, not really the most challenging part of the job; rather, they are one of the most exciting parts. Human data scientists thrive on opportunities to break new ground in developing new analytical approaches to improve patient care and our clients’ business. In fact, IQVIAN human data scientists already have several patents pending on our analytical techniques. Developing innovative analytic techniques to drive healthcare forward is a challenge we embrace.

我们许多人欢迎这项工作的另一个方面是有机会使用新的数据集 - 一个好处数字时代和越来越多(未识别的)人类数据的产生。We’re already working with non-identified genomics data, patient registry data, data collected from physician networks, data from public sources such as health authorities, data from insurance companies, and unstructured data such as physician notes within electronic medical records.

I think that for most of the work we do, the most important part is the first step: What question needs to be answered? What business problem must be solved? Often, clients approach us by requesting a particular analysis using a particular data set. But, after we probe to understand the context and the underlying issue, we recommend using a different analysis or a different data source altogether. Yes, sometimes a human data scientist can help with forming the question, not just the insights.

When we’re working exclusively with IQVIA data, the project usually proceeds according to the plan, as the data is ready for us to work with in developing models and algorithms. Often, though, we work with a client’s data or data from an agency or another external source. When that’s the case, we have to get the data ready for analysis. It has to be cleaned, controlled for quality, formatted etc. We automate these tasks as much as possible using models, ML, and natural language processing (NLP). Even with the models we use to clean data, this step is very time consuming.

Novel Applications


  • 开发了机器学习引擎,以识别医生的疾病,他们的疾病已经发展到他们需要转移到下一条治疗的地步。
  • 利用多个数据源并开发了多个机器学习算法,以了解针对多种适应症(例如肿瘤学或自身免疫性疾病)销售产品的特定诊断处方。
  • 使用机器学习来揭示医生对数字参与消息传递的响应及其沟通偏好。



More on this topic



Call Us

We are pleased to speak with you during our standard business hours.

+1 866 267 4479
