These are some of the questions in SAS CLINICAL INTERVIEWS. Hope these may aid in your help. These are contributed by some authors, so check before reading, in case of any mistakes or corrections let us know by mailing to shareurnoteswithus@gmail.com. Help us to improve the data.
1.Describe the phases of clinical
trials?
Ans:- These are the following four
phases of the clinical trials:
Phase 1: Test a new drug or treatment to
a small group of people (20-80) to evaluate its safety.
Phase 2: The experimental drug or
treatment is given to a large group of people (100-300) to see that the drug is
effective or not for that treatment.
Phase 3: The experimental drug or
treatment is given to a large group of people (1000-3000) to see its effectiveness,
monitor side effects and compare it to commonly used treatments.
Phase 4: The 4 phase study includes the
post marketing studies including the drug's risk, benefits etc.
2. Describe the validation procedure?
How would you perform the validation for TLG as well as analysis data set?
Ans:- Validation procedure is used to
check the output of the SAS program generated by the source programmer. In this
process validator write the program and generate the output. If this output is
same as the output generated by the SAS programmer's output then the program is
considered to be valid. We can perform this validation for TLG by checking the
output manually and for analysis data set it can be done using PROC COMPARE.
3. How would you perform the validation
for the listing, which has 400 pages?
Ans:- It is not possible to perform the
validation for the listing having 400 pages manually. To do this, we convert
the listing in data sets by using PROC RTF and then after that we can compare
it by using PROC COMPARE.
4. Can you use PROC COMPARE to validate
listings? Why?
Ans:- Yes, we can use PROC COMPARE to
validate the listing because if there are many entries (pages) in the listings
then it is not possible to check them manually. So in this condition we use
PROC COMPARE to validate the listings.
5. How would you generate tables,
listings and graphs?
Ans:- We can generate the listings by
using the PROC REPORT. Similarly we can create the tables by using PROC FREQ,
PROC MEANS, and PROC TRANSPOSE and PROC REPORT. We would generate graph, using
proc Gplot etc.
6. How many tables can you create in a
day?
Ans:- Actually it depends on the
complexity of the tables if there are same type of tables then, we can create
1-2-3 tables in a day.
7. What are all the PROCS have you used
in your experience?
Ans:- I have used many procedures like
proc report, proc sort, proc format etc. I have used proc report to generate
the list report, in this procedure I have used subjid as order variable and
trt_grp, sbd, dbd as display variables.
8. Describe the data sets you have come
across in your life?
Ans:- I have worked with demographic,
adverse event , laboratory, analysis and other data sets.
9. How would you submit the docs to FDA?
Who will submit the docs?
Ans:- We can submit the docs to FDA by
e-submission. Docs can be submitted to FDA using
Define.pdf or define.Xml formats. In
this doc we have the documentation about macros and program and E-records also.
Statistician or project manager will submit this doc to FDA.
10. What are the docs do you submit to
FDA?
Ans:- We submit ISS and ISE documents to
FDA.
11. Can u share your CDISC experience?
What version of CDISC SDTM have you used?
Ans: I have used version 3.1.1 of the
CDISC SDTM.
12. Tell me the importance of the SAP?
Ans:- This document contains detailed
information regarding study objectives and statistical methods to aid in the
production of the Clinical Study Report (CSR) including summary tables,
figures, and subject data listings for Protocol. This document also contains
documentation of the program variables and algorithms that will be used to
generate summary statistics and statistical analysis.
13. Tell me about your project group? To
whom you would report/contact?
My project group consisting of six
members, a project manager, two statisticians, lead programmer and two
programmers.
I usually report to the lead programmer.
If I have any problem regarding the programming I would contact the lead
programmer.
If I have any doubt in values of
variables in raw dataset I would contact the statistician. For example the
dataset related to the menopause symptoms in women, if the variable sex having
the values like F, M. I would consider it as wrong; in that type of situations
I would contact the statistician.
14. Explain SAS documentation.
SAS documentation includes programmer
header, comments, titles, footnotes etc. Whatever we type in the program for
making the program easily readable, easily understandable are in called as SAS
documentation.
15. How would you know whether the
program has been modified or not?
I would know the program has been
modified or not by seeing the modification history in the program header.
16. Project status meeting?
It is a planetary meeting of all the
project managers to discuss about the present Status of the project in hand and
discuss new ideas and options in improving the Way it is presently being
performed.
17. Describe clin-trial data base and
oracle clinical
Clintrial, the market's leading Clinical
Data Management System (CDMS).Oracle Clinical or OC is a database management
system designed by Oracle to provide data management, data entry and data
validation functionalities to Clinical Trials process.18. Tell me about MEDRA
and what version of MEDRA did you use in your project?Medical dictionary of
regulatory activities. Version 10
19. Describe SDTM?
CDISC’s Study Data Tabulation Model
(SDTM) has been developed to standardize what is submitted to the FDA.
20. What is CRT?
Case Report Tabulation, Whenever a
pharmaceutical company is submitting an NDA, conpany has to send the CRT's to
the FDA.
21. What is annotated CRF?
Annotated CRF is a CRF(Case report form)
in which variable names are written next the spaces provided to the
investigator. Annotated CRF serves as a link between the raw data and the
questions on the CRF. It is a valuable toll for the programmers and
statisticians..
22. What do you know about 21CRF PART
11?
Title 21 CFR Part 11 of the Code of
Federal Regulations deals with the FDA guidelines on electronic records and
electronic signatures in the United States. Part 11, as it is commonly called,
defines the criteria under which electronic records and electronic signatures
are considered to be trustworthy, reliable and equivalent to paper records.
23 What are the contents of AE dataset?
What is its purpose?
What are the variables in adverse event
datasets?The adverse event data set contains the SUBJID, body system of the
event, the preferred term for the event, event severity. The purpose of the AE
dataset is to give a summary of the adverse event for all the patients in the
treatment arms to aid in the inferential safety analysis of the drug.
24 What are the contents of lab data?
What is the purpose of data set?
The lab data set contains the SUBJID,
week number, and category of lab test, standard units, low normal and high
range of the values. The purpose of the lab data set is to obtain the
difference in the values of key variables after the administration of drug.
25.How did you do data cleaning? How do
you change the values in the data on your own?
I used proc freq and proc univariate to
find the discrepancies in the data, which I reported to my manager.
26.Have you created CRT’s, if you have,
tell me what have you done in that?
Yes I have created patient profile
tabulations as the request of my manager and and the statistician. I have used
PROC CONTENTS and PROC SQL to create simple patient listing which had all
information of a particular patient including age, sex, race etc.
27. Have you created transport files?
Yes, I have created SAS Xport transport
files using Proc Copy and data step for the FDA submissions. These are version
5 files. we use the libname engine and the Proc Copy procedure, One dataset in
each xport transport format file. For version 5: labels no longer than 40
bytes, variable names 8 bytes, character variables width to 200 bytes. If we
violate these constraints your copy procedure may terminate with constraints,
because SAS xport format is in compliance with SAS 5 datasets.
Libname sdtm “c:\sdtm_data”;Libname dm
xport “c:\dm.xpt”;
Proc copy;
In = sdtm;
Out = dm;
Select dm;
Run;
28. How did you do data cleaning? How do
you change the values in the data on your own?
I used proc freq and proc univariate to
find the discrepancies in the data, which I reported to my manager.
29. Definitions?
CDISC- Clinical data interchange
standards consortium.They have different data models, which define clinical
data standards for pharmaceutical industry.
SDTM – It defines the data tabulation
datasets that are to be sent to the FDA for regulatory submissions.
ADaM – (Analysis data Model)Defines data
set definition guidance for creating analysis data sets.
ODM – XML – based data model for allows
transfer of XML based data .
Define.xml – for data definition file
(define.pdf) which is machine readable.
ICH E3: Guideline, Structure and Content
of Clinical Study Reports
ICH E6: Guideline, Good Clinical
Practice
ICH E9: Guideline, Statistical
Principles for Clinical Trials
Title 21 Part 312.32: Investigational
New Drug Application
30. Have you ever done any Edit check
programs in your project, if you have, tell me what do you know about edit
check programs?
Yes I have done edit check programs
.Edit check programs – Data validation.
1.Data Validation – proc means, proc
univariate, proc freq.Data Cleaning – finding errors.
2.Checking for invalid character
values.Proc freq data = patients;Tables gender dx ae / nocum
nopercent;Run;Which gives frequency counts of unique character values.
3. Proc print with where statement to
list invalid data values.[systolic blood pressure - 80 to 100][diastolic blood
pressure – 60 to 120]
4. Proc means, univariate and tabulate
to look for outliers.Proc means – min, max, n and mean.Proc univariate – five
highest and lowest values[ stem leaf plots and box plots]
5. PROC FORMAT – range checking
6. Data Analysis – set, merge, update,
keep, drop in data step.
7. Create datasets – PROC IMPORT and
data step from flat files.
8. Extract data – LIBNAME.9. SAS/STAT –
PROC ANOVA, PROC REG.
10. Duplicate Data – PROC SORT Nodupkey
or Noduplicate Nodupkey – only checks for duplicates in BYNoduplicate – checks
entire observation (matches all variables)For getting duplicate observations
first sort BY nodupkey and merge it back to the original dataset and keep only records
in original and sorted.
11.For creating analysis datasets from
the raw data sets I used the PROC FORMAT, and rename and length statements to
make changes and finally make a analysis data set.
31. What is Verification?
The purpose of the verification is to
ensure the accuracy of the final tables and the quality of SAS programs that
generated the final tables. According to the instructions SOP and the SAP I
selected the subset of the final summary tables for verification.
E.g Adverse event table, baseline and
demographic characteristics table.The verification results were verified
against with the original final tables and all discrepancies if existed were
documented.
32. What is Program Validation?
Its same as macro validation except here
we have to validate the programs i.e according to the SOP I had to first
determine what the program is supposed to do, see if they work as they are
supposed to work and create a validation document mentioning if the program
works properly and set the status as pass or fail.Pass the input parameters to
the program and check the log for errors.
33. What do you lknow about ISS and ISE,
have you ever produced these reports?
ISS (Integrated summary of
safety):Integrates safety information from all sources (animal, clinical
pharmacology, controlled and uncontrolled studies, epidemiologic data).
"ISS is, in part, simply a summation of data from individual studies and,
in part, a new analysis that goes beyond what can be done with individual
studies."ISE (Integrated Summary of efficacy)ISS & ISE are critical
components of the safety and effectiveness submission and expected to be
submitted in the application in accordance with regulation. FDA’s guidance
Format and Content of Clinical and Statistical Sections of Application gives
advice on how to construct these summaries. Note that, despite the name, these
are integrated analyses of all relevant data, not summaries.
34. Explain the process and how to do
Data Validation?
I have done data validation and data
cleaning to check if the data values are correct or if they conform to the
standard set of rules.A very simple approach to identifying invalid character
values in this file is to use PROC FREQ to list all the unique values of these
variables. This gives us the total number of invalid observations. After
identifying the invalid data …we have to locate the observation so that we can
report to the manager the particular patient number.Invalid data can be located
using the data _null_ programming.
Following is e.g
DATA _NULL_;
INFILE "C:PATIENTS,TXT"
PAD;FILE PRINT; ***SEND OUTPUT TO THE OUTPUT WINDOW;
TITLE "LISTING OF INVALID
DATA";
***NOTE: WE WILL ONLY INPUT
THOSEVARIABLES OF INTEREST;INPUT @1 PATNO $3.@4 GENDER $1.@24 DX $3.@27 AE $1.;
***CHECK GENDER;IF GENDER NOT IN
('F','M',' ') THEN PUT PATNO= GENDER=;
***CHECK DX;
IF VERIFY(DX,' 0123456789') NE 0
THEN PUT PATNO= DX=;
***CHECK AE;
IF AE NOT IN ('0','1',' ') THEN PUT
PATNO= AE=;
RUN;
For data validation of numeric values
like out of range or missing values I used proc print with a where statement.
PROC PRINT DATA=CLEAN.PATIENTS;
WHERE HR NOT BETWEEN 40 AND 100 AND
HR IS NOT MISSING OR
SBP NOT BETWEEN 80 AND 200 AND
SBP IS NOT MISSING OR
DBP NOT BETWEEN 60 AND 120 AND
DBP IS NOT MISSING;TITLE
"OUT-OF-RANGE VALUES FOR NUMERIC VARIABLES";
ID PATNO;
VAR HR SBP DBP;
RUN;
If we have a range of numeric values
‘001’ – ‘999’ then we can first use user defined format and then use proc freq
to determine the invalid values.
PROC FORMAT;
VALUE $GENDER 'F','M' = 'VALID'' ' =
'MISSING'OTHER = 'MISCODED';
VALUE $DX '001' - '999'= 'VALID'' ' =
'MISSING'OTHER = 'MISCODED';
VALUE $AE '0','1' = 'VALID'' ' =
'MISSING'OTHER = 'MISCODED';
RUN;
One of the simplest ways to check for
invalid numeric values is to run either PROC MEANS or PROC UNIVARIATE.We can
use the N and NMISS options in the Proc Means to check for missing and invalid
data. Default (n nmiss mean min max stddev).The main advantage of using PROC
UNIVARIATE (default n mean std skewness kurtosis) is that we get the extreme
values i.e lowest and highest 5 values which we can see for data errors. If u
want to see the patid for these particular observations …..state and ID patno
statement in the univariate procedure.
35. Roles and responsibilities?
Programmer:
Develop programming for report formats
(ISS & ISE shell) required by the regulatory authorities.Update ISS/ISE
shell, when required.
Clinical Study Team:
Provide information on safety and
efficacy findings, when required.Provide updates on safety and efficacy
findings for periodic reporting.
Study Statistician
Draft ISS and ISE shell.Update shell,
when appropriate.Analyze and report data in approved format, to meet periodic
reporting requirements.
36. Explain Types of Clinical trials study
you come across?
Single Blind Study
When the patients are not aware of which
treatment they receive.
Double Blind Study
When the patients and the investigator
are unaware of the treatment group assigned.
Triple Blind Study
Triple blind study is when patients,
investigator, and the project team are unaware of the treatments administered.
37. What are the domains/datasets you
have used in your studies?
Demog
Adverse Events
Vitals
ECG
Labs
Medical History
PhysicalExam etc
38. Can you list the variables in all
the domains?
Demog: Usubjid, Patient Id, Age, Sex,
Race, Screening Weight, Screening Height, BMI etc
Adverse Events: Protocol no,
Investigator no, Patient Id, Preferred Term, Investigator Term, (Abdominal dis,
Freq urination, headache, dizziness, hand-food syndrome, rash, Leukopenia,
Neutropenia) Severity, Seriousness (y/n), Seriousness Type (death, life
threatening, permanently disabling), Visit number, Start time, Stop time,
Related to study drug?
Vitals: Subject number, Study date,
Procedure time, Sitting blood pressure, Sitting Cardiac Rate, Visit number,
Change from baseline, Dose of treatment at time of vital sign, Abnormal
(yes/no), BMI, Systolic blood pressure, Diastolic blood pressure.
ECG: Subject no, Study Date, Study Time,
Visit no, PR interval (msec), QRS duration (msec), QT interval (msec), QTc
interval (msec), Ventricular Rate (bpm), Change from baseline, Abnormal.
Labs: Subject no, Study day, Lab
parameter (Lparm), lab units, ULN (upper limit of normal), LLN (lower limit of
normal), visit number, change from baseline, Greater than ULN (yes/no), lab
related serious adverse event (yes/no).Medical History: Medical Condition, Date
of Diagnosis (yes/no), Years of onset or occurrence, Past condition (yes/no),
Current condition (yes/no).
PhysicalExam: Subject no, Exam date,
Exam time, Visit number, Reason for exam, Body system, Abnormal (yes/no),
Findings, Change from baseline (improvement, worsening, no change), Comments
39. Give me the example of edit ckecks
you made in your programs?Examples of Edit Checks
Demog:Weight is outside expected
rangeBody mass index is below expected
( check weight and height)
Age is not within expected range.
DOB is greater than the Visit date or
not..
Gender value is a valid one or invalid.
etc
Adverse Event
Stop is before the start or visit Start
is before birthdate Study medicine discontinued due to adverse event but
completion indicated (COMPLETE =1)
Labs
Result is within the normal range but
abnormal is not blank or ‘N’Result is outside the normal range but abnormal is
blank
Vitals
Diastolic BP > Systolic BP
Medical History
Visit date prior to Screen
datePhysicalPhysical exam is normal but comment included
40. What are the advantages of using SAS
in clinical data management? Why should not we use other software products in
managing clinical data?
ADVANTAGES OF USING A SAS®-BASED SYSTEM
Less hardware is required.
A Typical SAS®-based system can utilize
a standard file server to store its databases and does not require one or more
dedicated servers to handle the application load. PC SAS® can easily be used to
handle processing, while data access is left to the file server. Additionally,
as presented later in this paper, it is possible to use the SAS® product
SAS®/Share to provide a dedicated server to handle data transactions.
Fewer personnel are required.
Systems that use complicated database
software often require the hiring of one ore more DBA’s (Database
Administrators) who make sure the database software is running, make changes to
the structure of the database, etc. These individuals often require special
training or background experience in the particular database application being
used, typically Oracle. Additionally, consultants are often required to set up
the system and/or studies since dedicated servers and specific expertise
requirements often complicate the process.Users with even casual SAS®
experience can set up studies. Novice programmers can build the structure of
the database and design screens. Organizations that are involved in data
management almost always have at least one SAS® programmer already on staff.
SAS® programmers will have an understanding of how the system actually works
which would allow them to extend the functionality of the system by directly
accessing SAS® data from outside of the system.Speed of setup is dramatically
reduced. By keeping studies on a local file server and making the database and
screen design processes extremely simple and intuitive, setup time is reduced
from weeks to days.All phases of the data management process become
homogeneous. From entry to analysis, data reside in SAS® data sets, often the
end goal of every data management group. Additionally, SAS® users are involved
in each step, instead of having specialists from different areas hand off
pieces of studies during the project life cycle.No data conversion is required.
Since the data reside in SAS® data sets natively, no conversion programs need
to be written.Data review can happen during the data entry process, on the
master database. As long as records are marked as being double-keyed, data
review personnel can run edit check programs and build queries on some patients
while others are still being entered.Tables and listings can be generated on
live data. This helps speed up the development of table and listing programs
and allows programmers to avoid having to make continual copies or extracts of
the data during testing.43. Have you ever had to follow SOPs or programming
guidelines?SOP describes the process to assure that standard coding activities,
which produce tables, listings and graphs, functions and/or edit checks, are
conducted in accordance with industry standards are appropriately documented.It
is normally used whenever new programs are required or existing programs
required some modification during the set-up, conduct, and/or reporting
clinical trial data.44. Describe the types of SAS programming tasks that you
performed: Tables? Listings? Graphics? Ad hoc reports? Other?Prepared programs
required for the ISS and ISE analysis reports. Developed and validated programs
for preparing ad-hoc statistical reports for the preparation of clinical study
report. Wrote analysis programs in line with the specifications defined by the
study statistician. Base SAS (MEANS, FREQ, SUMMARY, TABULATE, REPORT etc) and
SAS/STAT procedures (REG, GLM, ANOVA, and UNIVARIATE etc.) were used for
summarization, Cross-Tabulations and statistical analysis purposes. Created
Statistical reports using Proc Report, Data _null_ and SAS Macro. Created,
derived and merged and pooled datasets,listings and summary tables for Phase-I
and Phase-II of clinical trials.45. Have you been involved in editing the data
or writing data queries?If your interviewer asks this question, the u should
ask him what he means by editing the data… and data queries…
41. Are you involved in writing the
inferential analysis plan? Table’s specifications?
42. What do you feel about hardcoding?
Programmers sometime hardcode when they
need to produce report in urgent. But it is always better to avoid hardcoding,
as it overrides the database controls in clinical data management. Data often
change in a trial over time, and the hardcode that is written today may not be
valid in the future.Unfortunately, a hardcode may be forgotten and left in the
SAS program, and that can lead to an incorrect database change.
43. How do you write a test plan?
Before writing "Test plan" you
have to look into on "Functional specifications". Functional
specifications itself depends on "Requirements", so one should have
clear understanding of requirements and functional specifications to write a
test plan.
44. What is the difference between
verification and validation?
Although the verification and validation
are close in meaning, "verification" has more of a sense of testing
the truth or accuracy of a statement by examining evidence or conducting
experiments, while "validate" has more of a sense of declaring a
statement to be true and marking it with an indication of official sanction.
45.What other SAS features do you use
for error trapping and data validation?
Conditional statements, if then else.
Put statement
Debug option.
46. What is PROC CDISC?
It is new SAS procedure that is
available as a hotfix for SAS 8.2 version and comes as a part withSAS 9.1.3
version.
PROC CDISC is a procedure that allows us
to import (and export XML files that are compliant with the CDISC ODM version
1.2 schema.
For more details refer SAS programming
in the Pharmaceutical Industry text book.
47) What is LOCF?
Pharmaceutical companies conduct
longitudinalstudies on human subjects that often span several months. It is
unrealistic to expect patients to keep every scheduled visit over such a long
period of time.Despite every effort, patient data are not collected for some
time points. Eventually, these become missing values in a SAS data set later.
For reporting purposes,the most recent previously available value is
substituted for each missing visit. This is called the Last Observation Carried
Forward (LOCF).LOCF doesn't mean last SAS dataset observation carried forward.
It means last non-missing value carried forward. It is the values of individual
measures that are the "observations" in this case. And if you have
multiple variables containing these values then they will be carried forward
independently.
48) ETL process:
Extract, transform and Load:
Extract:
The 1st part of an ETL process is to
extract the data from the source systems. Most data warehousing projects
consolidate data from different source systems.
Each separate system may also use a
different data organization / format. Common data source formats are relational
databases and flat files, but may include non-relational database structures
such as IMS or other data structures such as VSAM or ISAM.
Extraction converts the data into a
format for transformation processing.An intrinsic part of the extraction is the
parsing of extracted data, resulting in a check if the data meets an expected
pattern
Transform:The transform stage applies a
series of rules or functions to the extracted data from the source to derive
the data to be loaded to the end target. Some data sources will require very
little or even no manipulation of data. In other cases, one or more of the
following transformations types to meet the business and technical needs of the
end target may be required:·
Selecting only certain columns to load
(or selecting null columns not to load) · Translating coded values (e.g., if
the source system stores 1 for male and 2 for female, but the warehouse stores
M for male and F for female), this is called automated data cleansing; no
manual cleansing occurs during ETL · Encoding free-form values (e.g., mapping
"Male" to "1" and "Mr" to M) ·
Joining together data from multiple
sources (e.g., lookup, merge, etc.) · Generating surrogate key values ·
Transposing or pivoting (turning multiple columns into multiple rows or vice
versa) · Splitting a column into multiple columns (e.g., putting a comma-separated
list specified as a string in one column as individual values in different
columns) ·
Applying any form of simple or complex
data validation; if failed, a full, partial or no rejection of the data, and
thus no, partial or all the data is handed over to the next step, depending on
the rule design and exception handling. Most of the above transformations
itself might result in an exception, e.g. when a code-translation parses an
unknown code in the extracted data.Load:The load phase loads the data into the
end target, usually being the data warehouse (DW).
Depending on the requirements of the
organization, this process ranges widely. Some data warehouses might weekly
overwrite existing information with cumulative, updated data, while other DW
(or even other parts of the same DW) might add new data in a historized form,
e.g. hourly. The timing and scope to replace or append are strategic design
choices dependent on the time available and the business needs. More complex
systems can maintain a history and audit trail of all changes to the data
loaded in the DW.
As the load phase interacts with a
database, the constraints defined in the database schema as well as in triggers
activated upon data load apply (e.g. uniqueness, referential integrity, mandatory
fields), which also contribute to the overall data quality performance of the
ETL process.
source: wikipedia
Post a Comment