타이타닉 생존자 예측 대회

학습 내용

  • 1-1 데이터 불러오기
  • 1-2 데이터 탐색하기
  • 1-3 결측치 처리
  • 1-4 모델 선택 및 평가

Data Fields

구분 설명
Survival 생존 여부 Survival. 0 = No, 1 = Yes
Pclass 티켓의 클래스 Ticket class. 1 = 1st, 2 = 2nd, 3 = 3rd
Sex 성별(Sex) 남(male)/여(female)
Age 나이(Age in years.)
SibSp 함께 탑승한 형제와 배우자의 수 /siblings, spouses aboard the Titanic.
Parch 함께 탑승한 부모, 아이의 수 # of parents / children aboard the Titanic.
Ticket 티켓 번호(Ticket number) (ex) CA 31352, A/5. 2151
Fare 탑승료(Passenger fare)
Cabin 객실 번호(Cabin number)
Embarked 탑승 항구(Port of Embarkation) C = Cherbourg, Q = Queenstown, S = Southampton
  • siblings : 형제, 자매, 형제, 의붓 형제
  • spouses : 남편, 아내 (정부와 약혼자는 무시)
  • Parch : Parent(mother, father), child(daughter, son, stepdaughter, stepson)
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
train = pd.read_csv("data/titanic/train.csv")
test = pd.read_csv("data/titanic/test.csv")
sub = pd.read_csv("data/titanic/gender_submission.csv")

EDA

수치형 변수 확인 및 요약값

In [3]:
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
In [8]:
train['Survived'].dtype
Out[8]:
dtype('int64')
In [4]:
num_cols = [col for col in train.columns[:12] 
               if train[col].dtype in ['int64', 'float64'] ]

print(num_cols)
train[num_cols].describe()
['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']
Out[4]:
PassengerId Survived Pclass Age SibSp Parch Fare
count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000
mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208
std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429
min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000
25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400
50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200
75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000
max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

범주형 변수 살펴보기

In [9]:
cat_cols = [col for col in train.columns[:12] 
                  if train[col].dtype in ['O'] ]

print(cat_cols)

train[cat_cols].describe()
['Name', 'Sex', 'Ticket', 'Cabin', 'Embarked']
Out[9]:
Name Sex Ticket Cabin Embarked
count 891 891 891 204 889
unique 891 2 681 147 3
top Garfirth, Mr. John male 347082 G6 S
freq 1 577 7 4 644

범주형 데이터에 대해 확인해 보기

In [11]:
import numpy as np
In [12]:
for col in cat_cols:
    uniq = np.unique(train[col].astype(str))
    print("colname : {}, uniq : {}".format(col, uniq), end="\n\n")
colname : Name, uniq : ['Abbing, Mr. Anthony' 'Abbott, Mr. Rossmore Edward'
 'Abbott, Mrs. Stanton (Rosa Hunt)' 'Abelson, Mr. Samuel'
 'Abelson, Mrs. Samuel (Hannah Wizosky)' 'Adahl, Mr. Mauritz Nils Martin'
 'Adams, Mr. John' 'Ahlin, Mrs. Johan (Johanna Persdotter Larsson)'
 'Aks, Mrs. Sam (Leah Rosen)' 'Albimona, Mr. Nassef Cassem'
 'Alexander, Mr. William' 'Alhomaki, Mr. Ilmari Rudolf' 'Ali, Mr. Ahmed'
 'Ali, Mr. William' 'Allen, Miss. Elisabeth Walton'
 'Allen, Mr. William Henry' 'Allison, Master. Hudson Trevor'
 'Allison, Miss. Helen Loraine'
 'Allison, Mrs. Hudson J C (Bessie Waldo Daniels)'
 'Allum, Mr. Owen George'
 'Andersen-Jensen, Miss. Carla Christine Nielsine' 'Anderson, Mr. Harry'
 'Andersson, Master. Sigvard Harald Elias'
 'Andersson, Miss. Ebba Iris Alfrida' 'Andersson, Miss. Ellis Anna Maria'
 'Andersson, Miss. Erna Alexandra' 'Andersson, Miss. Ingeborg Constanzia'
 'Andersson, Miss. Sigrid Elisabeth' 'Andersson, Mr. Anders Johan'
 'Andersson, Mr. August Edvard ("Wennerstrom")'
 'Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)'
 'Andreasson, Mr. Paul Edvin' 'Andrew, Mr. Edgardo Samuel'
 'Andrews, Miss. Kornelia Theodosia' 'Andrews, Mr. Thomas Jr'
 'Angle, Mrs. William A (Florence "Mary" Agnes Hughes)'
 'Appleton, Mrs. Edward Dale (Charlotte Lamson)'
 'Arnold-Franchi, Mr. Josef'
 'Arnold-Franchi, Mrs. Josef (Josefine Franchi)' 'Artagaveytia, Mr. Ramon'
 'Asim, Mr. Adola' 'Asplund, Master. Clarence Gustaf Hugo'
 'Asplund, Master. Edvin Rojj Felix' 'Asplund, Miss. Lillian Gertrud'
 'Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)'
 'Astor, Mrs. John Jacob (Madeleine Talmadge Force)'
 'Attalah, Miss. Malake' 'Attalah, Mr. Sleiman'
 'Aubart, Mme. Leontine Pauline' 'Augustsson, Mr. Albert'
 'Ayoub, Miss. Banoura' 'Backstrom, Mr. Karl Alfred'
 'Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)'
 'Baclini, Miss. Eugenie' 'Baclini, Miss. Helene Barbara'
 'Baclini, Miss. Marie Catherine' 'Baclini, Mrs. Solomon (Latifa Qurban)'
 'Badt, Mr. Mohamed' 'Bailey, Mr. Percy Andrew' 'Balkic, Mr. Cerin'
 'Ball, Mrs. (Ada E Hall)' 'Banfield, Mr. Frederick James'
 'Barah, Mr. Hanna Assi' 'Barbara, Miss. Saiide'
 'Barbara, Mrs. (Catherine David)' 'Barber, Miss. Ellen "Nellie"'
 'Barkworth, Mr. Algernon Henry Wilson' 'Barton, Mr. David John'
 'Bateman, Rev. Robert James' 'Baumann, Mr. John D'
 'Baxter, Mr. Quigg Edmond'
 'Baxter, Mrs. James (Helene DeLaudeniere Chaput)' 'Bazzani, Miss. Albina'
 'Beane, Mr. Edward' 'Beane, Mrs. Edward (Ethel Clarke)'
 'Beavan, Mr. William Thomas' 'Becker, Master. Richard F'
 'Becker, Miss. Marion Louise' 'Beckwith, Mr. Richard Leonard'
 'Beckwith, Mrs. Richard Leonard (Sallie Monypeny)'
 'Beesley, Mr. Lawrence' 'Behr, Mr. Karl Howell'
 'Bengtsson, Mr. John Viktor' 'Berglund, Mr. Karl Ivar Sven'
 'Berriman, Mr. William John' 'Betros, Mr. Tannous'
 'Bidois, Miss. Rosalie' 'Bing, Mr. Lee'
 'Birkeland, Mr. Hans Martin Monsen' 'Bishop, Mr. Dickinson H'
 'Bishop, Mrs. Dickinson H (Helen Walton)' 'Bissette, Miss. Amelia'
 'Bjornstrom-Steffansson, Mr. Mauritz Hakan'
 'Blackwell, Mr. Stephen Weart' 'Blank, Mr. Henry'
 'Bonnell, Miss. Elizabeth' 'Bostandyeff, Mr. Guentcho'
 'Boulos, Miss. Nourelain' 'Boulos, Mr. Hanna'
 'Boulos, Mrs. Joseph (Sultana)' 'Bourke, Miss. Mary' 'Bourke, Mr. John'
 'Bourke, Mrs. John (Catherine)' 'Bowen, Mr. David John "Dai"'
 'Bowerman, Miss. Elsie Edith' 'Bracken, Mr. James H'
 'Bradley, Mr. George ("George Arthur Brayton")'
 'Braund, Mr. Lewis Richard' 'Braund, Mr. Owen Harris'
 'Brewe, Dr. Arthur Jackson' 'Brocklebank, Mr. William Alfred'
 'Brown, Miss. Amelia "Mildred"' 'Brown, Mr. Thomas William Solomon'
 'Brown, Mrs. James Joseph (Margaret Tobin)'
 'Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)'
 'Bryhl, Mr. Kurt Arnold Gottfrid' 'Burke, Mr. Jeremiah'
 'Burns, Miss. Elizabeth Margaret' 'Buss, Miss. Kate'
 'Butler, Mr. Reginald Fenton' 'Butt, Major. Archibald Willingham'
 'Byles, Rev. Thomas Roussel Davids' 'Bystrom, Mrs. (Karolina)'
 'Cacic, Miss. Marija' 'Cacic, Mr. Luka' 'Cairns, Mr. Alexander'
 'Calderhead, Mr. Edward Pennington' 'Caldwell, Master. Alden Gates'
 'Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)' 'Calic, Mr. Jovo'
 'Calic, Mr. Petar' 'Cameron, Miss. Clear Annie' 'Campbell, Mr. William'
 'Canavan, Miss. Mary' 'Cann, Mr. Ernest Charles'
 'Caram, Mrs. Joseph (Maria Elias)' 'Carbines, Mr. William'
 'Cardeza, Mr. Thomas Drake Martinez' 'Carlsson, Mr. August Sigfrid'
 'Carlsson, Mr. Frans Olof' 'Carr, Miss. Helen "Ellen"'
 'Carrau, Mr. Francisco M' 'Carter, Master. William Thornton II'
 'Carter, Miss. Lucile Polk' 'Carter, Mr. William Ernest'
 'Carter, Mrs. Ernest Courtenay (Lilian Hughes)'
 'Carter, Mrs. William Ernest (Lucile Polk)'
 'Carter, Rev. Ernest Courtenay' 'Cavendish, Mr. Tyrell William'
 'Celotti, Mr. Francesco' 'Chaffee, Mr. Herbert Fuller'
 'Chambers, Mr. Norman Campbell'
 'Chambers, Mrs. Norman Campbell (Bertha Griggs)'
 'Chapman, Mr. Charles Henry' 'Chapman, Mr. John Henry'
 'Charters, Mr. David' 'Cherry, Miss. Gladys'
 'Chibnall, Mrs. (Edith Martha Bowerman)' 'Chip, Mr. Chang'
 'Christmann, Mr. Emil' 'Christy, Miss. Julie Rachel'
 'Chronopoulos, Mr. Apostolos'
 'Clarke, Mrs. Charles V (Ada Maria Winfield)' 'Cleaver, Miss. Alice'
 'Clifford, Mr. George Quincy' 'Coelho, Mr. Domingos Fernandeo'
 'Cohen, Mr. Gurshon "Gus"' 'Coleff, Mr. Peju' 'Coleff, Mr. Satio'
 'Coleridge, Mr. Reginald Charles' 'Collander, Mr. Erik Gustaf'
 'Colley, Mr. Edward Pomeroy' 'Collyer, Miss. Marjorie "Lottie"'
 'Collyer, Mr. Harvey' 'Collyer, Mrs. Harvey (Charlotte Annie Tate)'
 'Compton, Miss. Sara Rebecca' 'Connaghton, Mr. Michael'
 'Connolly, Miss. Kate' 'Connors, Mr. Patrick' 'Cook, Mr. Jacob'
 'Cor, Mr. Liudevit' 'Corn, Mr. Harry'
 'Coutts, Master. Eden Leslie "Neville"'
 'Coutts, Master. William Loch "William"' 'Coxon, Mr. Daniel'
 'Crease, Mr. Ernest James' 'Cribb, Mr. John Hatfield'
 'Crosby, Capt. Edward Gifford' 'Crosby, Miss. Harriet R'
 'Culumovic, Mr. Jeso'
 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)'
 'Cunningham, Mr. Alfred Fleming' 'Dahl, Mr. Karl Edwart'
 'Dahlberg, Miss. Gerda Ulrika' 'Dakic, Mr. Branko'
 'Daly, Mr. Eugene Patrick' 'Daly, Mr. Peter Denis '
 'Danbom, Mr. Ernst Gilbert'
 'Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)'
 'Daniel, Mr. Robert Williams' 'Danoff, Mr. Yoto' 'Dantcheff, Mr. Ristiu'
 'Davidson, Mr. Thornton' 'Davies, Master. John Morgan Jr'
 'Davies, Mr. Alfred J' 'Davies, Mr. Charles Henry' 'Davis, Miss. Mary'
 'Davison, Mrs. Thomas Henry (Mary E Finck)' 'Dean, Master. Bertram Vere'
 'Dean, Mr. Bertram Frank' 'Denkoff, Mr. Mitto' 'Dennis, Mr. Samuel'
 'Devaney, Miss. Margaret Delia' 'Dick, Mr. Albert Adrian'
 'Dick, Mrs. Albert Adrian (Vera Gillespie)' 'Dimic, Mr. Jovan'
 'Dodge, Master. Washington' 'Doharr, Mr. Tannous' 'Doling, Miss. Elsie'
 'Doling, Mrs. John T (Ada Julia Bone)' 'Dooley, Mr. Patrick'
 'Dorking, Mr. Edward Arthur' 'Douglas, Mr. Walter Donald'
 'Dowdell, Miss. Elizabeth' 'Downton, Mr. William James'
 'Drazenoic, Mr. Jozef' 'Drew, Mrs. James Vivian (Lulu Thorne Christian)'
 'Duane, Mr. Frank'
 'Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")'
 'Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan")'
 'Duran y More, Miss. Asuncion' 'Edvardsson, Mr. Gustaf Hjalmar'
 'Eitemiller, Mr. George Floyd' 'Eklund, Mr. Hans Linus'
 'Ekstrom, Mr. Johan' 'Elias, Mr. Dibo' 'Elias, Mr. Joseph Jr'
 'Elias, Mr. Tannous' 'Elsbury, Mr. William James'
 'Emanuel, Miss. Virginia Ethel' 'Emir, Mr. Farred Chehab'
 'Endres, Miss. Caroline Louise' 'Eustis, Miss. Elizabeth Mussey'
 'Fahlstrom, Mr. Arne Jonas' 'Farrell, Mr. James' 'Farthing, Mr. John'
 'Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)'
 'Fischer, Mr. Eberhard Thelander' 'Fleming, Miss. Margaret'
 'Flynn, Mr. James' 'Flynn, Mr. John' 'Flynn, Mr. John Irwin ("Irving")'
 'Foo, Mr. Choong' 'Ford, Miss. Doolina Margaret "Daisy"'
 'Ford, Miss. Robina Maggie "Ruby"' 'Ford, Mr. William Neal'
 'Ford, Mrs. Edward (Margaret Ann Watson)'
 'Foreman, Mr. Benjamin Laventall' 'Fortune, Miss. Alice Elizabeth'
 'Fortune, Miss. Mabel Helen' 'Fortune, Mr. Charles Alexander'
 'Fortune, Mr. Mark' 'Fox, Mr. Stanley Hubert'
 'Francatelli, Miss. Laura Mabel' 'Frauenthal, Dr. Henry William'
 'Frauenthal, Mrs. Henry William (Clara Heinsheimer)'
 'Frolicher, Miss. Hedwig Margaritha' 'Frolicher-Stehli, Mr. Maxmillian'
 'Frost, Mr. Anthony Wood "Archie"' 'Fry, Mr. Richard'
 'Funk, Miss. Annie Clemmer' 'Futrelle, Mr. Jacques Heath'
 'Futrelle, Mrs. Jacques Heath (Lily May Peel)' 'Fynney, Mr. Joseph J'
 'Gale, Mr. Shadrach' 'Gallagher, Mr. Martin' 'Garfirth, Mr. John'
 'Garside, Miss. Ethel' 'Gaskell, Mr. Alfred' 'Gavey, Mr. Lawrence'
 'Gee, Mr. Arthur H' 'Gheorgheff, Mr. Stanio' 'Giglio, Mr. Victor'
 'Giles, Mr. Frederick Edward' 'Gilinski, Mr. Eliezer'
 'Gill, Mr. John William' 'Gillespie, Mr. William Henry'
 'Gilnagh, Miss. Katherine "Katie"' 'Givard, Mr. Hans Kristensen'
 'Glynn, Miss. Mary Agatha' 'Goldenberg, Mr. Samuel L'
 'Goldenberg, Mrs. Samuel L (Edwiga Grabowska)'
 'Goldschmidt, Mr. George B'
 'Goldsmith, Master. Frank John William "Frankie"'
 'Goldsmith, Mr. Frank John'
 'Goldsmith, Mrs. Frank John (Emily Alice Brown)'
 'Goncalves, Mr. Manuel Estanslas' 'Goodwin, Master. Harold Victor'
 'Goodwin, Master. Sidney Leonard' 'Goodwin, Master. William Frederick'
 'Goodwin, Miss. Lillian Amy' 'Goodwin, Mr. Charles Edward'
 'Goodwin, Mrs. Frederick (Augusta Tyler)' 'Graham, Miss. Margaret Edith'
 'Graham, Mr. George Edward'
 'Graham, Mrs. William Thompson (Edith Junkins)' 'Green, Mr. George Henry'
 'Greenberg, Mr. Samuel' 'Greenfield, Mr. William Bertram'
 'Gronnestad, Mr. Daniel Danielsen' 'Guggenheim, Mr. Benjamin'
 'Gustafsson, Mr. Alfred Ossian' 'Gustafsson, Mr. Anders Vilhelm'
 'Gustafsson, Mr. Johan Birger' 'Gustafsson, Mr. Karl Gideon'
 'Haas, Miss. Aloisia' 'Hagland, Mr. Ingvald Olai Olsen'
 'Hagland, Mr. Konrad Mathias Reiersen' 'Hakkarainen, Mr. Pekka Pietari'
 'Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)'
 'Hale, Mr. Reginald' 'Hamalainen, Master. Viljo'
 'Hamalainen, Mrs. William (Anna)' 'Hampe, Mr. Leon' 'Hanna, Mr. Mansour'
 'Hansen, Mr. Claus Peter' 'Hansen, Mr. Henrik Juul'
 'Hansen, Mr. Henry Damsgaard' 'Harder, Mr. George Achilles'
 'Harknett, Miss. Alice Phoebe' 'Harmer, Mr. Abraham (David Lishin)'
 'Harper, Miss. Annie Jessie "Nina"' 'Harper, Mr. Henry Sleeper'
 'Harper, Mrs. Henry Sleeper (Myna Haxtun)' 'Harper, Rev. John'
 'Harrington, Mr. Charles H' 'Harris, Mr. George'
 'Harris, Mr. Henry Birkhardt' 'Harris, Mr. Walter'
 'Harris, Mrs. Henry Birkhardt (Irene Wallach)' 'Harrison, Mr. William'
 'Hart, Miss. Eva Miriam' 'Hart, Mr. Benjamin' 'Hart, Mr. Henry'
 'Hart, Mrs. Benjamin (Esther Ada Bloomfield)' 'Hassab, Mr. Hammad'
 'Hassan, Mr. Houssein G N' 'Hawksford, Mr. Walter James'
 'Hays, Miss. Margaret Bechstein'
 'Hays, Mrs. Charles Melville (Clara Jennings Gregg)'
 'Healy, Miss. Hanora "Nora"' 'Hedman, Mr. Oskar Arvid'
 'Hegarty, Miss. Hanora "Nora"' 'Heikkinen, Miss. Laina'
 'Heininen, Miss. Wendla Maria' 'Hendekovic, Mr. Ignjac'
 'Henry, Miss. Delia' 'Herman, Miss. Alice'
 'Herman, Mrs. Samuel (Jane Laver)' 'Hewlett, Mrs. (Mary D Kingcome) '
 'Hickman, Mr. Leonard Mark' 'Hickman, Mr. Lewis'
 'Hickman, Mr. Stanley George' 'Hippach, Miss. Jean Gertrude'
 'Hippach, Mrs. Louis Albert (Ida Sophia Fischer)'
 'Hirvonen, Miss. Hildur E' 'Hocking, Mr. Richard George'
 'Hocking, Mrs. Elizabeth (Eliza Needs)' 'Hodges, Mr. Henry Price'
 'Hogeboom, Mrs. John C (Anna Andrews)' 'Hold, Mr. Stephen'
 'Holm, Mr. John Fredrik Alexander' 'Holverson, Mr. Alexander Oskar'
 'Holverson, Mrs. Alexander Oskar (Mary Aline Towner)'
 'Homer, Mr. Harry ("Mr E Haven")' 'Honkanen, Miss. Eliina'
 'Hood, Mr. Ambrose Jr' 'Horgan, Mr. John' 'Hosono, Mr. Masabumi'
 'Hoyt, Mr. Frederick Maxfield' 'Hoyt, Mr. William Fisher'
 'Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)'
 'Humblen, Mr. Adolf Mathias Nicolai Olsen' 'Hunt, Mr. George Henry'
 'Ibrahim Shawah, Mr. Yousseff' 'Icard, Miss. Amelie'
 'Ilett, Miss. Bertha' 'Ilmakangas, Miss. Pieta Sofia'
 'Isham, Miss. Ann Elizabeth' 'Ivanoff, Mr. Kanio'
 'Jacobsohn, Mr. Sidney Samuel'
 'Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)'
 'Jalsevac, Mr. Ivan' 'Jansson, Mr. Carl Olof' 'Jardin, Mr. Jose Neto'
 'Jarvis, Mr. John Denzil' 'Jenkin, Mr. Stephen Curnow'
 'Jensen, Mr. Hans Peder' 'Jensen, Mr. Niels Peder'
 'Jensen, Mr. Svend Lauritz' 'Jermyn, Miss. Annie'
 'Jerwan, Mrs. Amin S (Marie Marthe Thuillard)'
 'Johannesen-Bratthammer, Mr. Bernt' 'Johanson, Mr. Jakob Alfred'
 'Johansson, Mr. Erik' 'Johansson, Mr. Gustaf Joel'
 'Johansson, Mr. Karl Johan' 'Johnson, Master. Harold Theodor'
 'Johnson, Miss. Eleanor Ileen' 'Johnson, Mr. Alfred'
 'Johnson, Mr. Malkolm Joackim' 'Johnson, Mr. William Cahoone Jr'
 'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)'
 'Johnston, Miss. Catherine Helen "Carrie"' 'Johnston, Mr. Andrew G'
 'Jonkoff, Mr. Lalio' 'Jonsson, Mr. Carl' 'Jussila, Miss. Katriina'
 'Jussila, Miss. Mari Aina' 'Jussila, Mr. Eiriik'
 'Kallio, Mr. Nikolai Erland' 'Kalvik, Mr. Johannes Halvorsen'
 'Kantor, Mr. Sinai' 'Kantor, Mrs. Sinai (Miriam Sternin)'
 'Karaic, Mr. Milan' 'Karlsson, Mr. Nils August' 'Karun, Miss. Manca'
 'Kassem, Mr. Fared' 'Keane, Miss. Nora A' 'Keane, Mr. Andrew "Andy"'
 'Keefe, Mr. Arthur' 'Kelly, Miss. Anna Katherine "Annie Kate"'
 'Kelly, Miss. Mary' 'Kelly, Mr. James' 'Kelly, Mrs. Florence "Fannie"'
 'Kent, Mr. Edward Austin' 'Kenyon, Mrs. Frederick R (Marion)'
 'Kiernan, Mr. Philip' 'Kilgannon, Mr. Thomas J'
 'Kimball, Mr. Edwin Nelson Jr' 'Kink, Mr. Vincenz'
 'Kink-Heilmann, Miss. Luise Gretchen' 'Kirkland, Rev. Charles Leonard'
 'Klaber, Mr. Herman' 'Klasen, Mr. Klas Albin' 'Knight, Mr. Robert J'
 'Kraeff, Mr. Theodor' 'Kvillner, Mr. Johan Henrik Johannesson'
 'Lahoud, Mr. Sarkis' 'Lahtinen, Mrs. William (Anna Sylfven)'
 'Laitinen, Miss. Kristina Sofia' 'Laleff, Mr. Kristo' 'Lam, Mr. Ali'
 'Lam, Mr. Len' 'Landergren, Miss. Aurora Adelia' 'Lang, Mr. Fang'
 'Laroche, Miss. Simonne Marie Anne Andree'
 'Laroche, Mr. Joseph Philippe Lemercier'
 'Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)'
 'Larsson, Mr. August Viktor' 'Larsson, Mr. Bengt Edvin'
 'LeRoy, Miss. Bertha' 'Leader, Dr. Alice (Farnham)'
 'Leeni, Mr. Fahim ("Philip Zenni")' 'Lefebre, Master. Henry Forbes'
 'Lefebre, Miss. Ida' 'Lefebre, Miss. Jeannie' 'Lefebre, Miss. Mathilde'
 'Lehmann, Miss. Bertha' 'Leinonen, Mr. Antti Gustaf'
 'Leitch, Miss. Jessie Wills' 'Lemberopolous, Mr. Peter L'
 'Lemore, Mrs. (Amelia Milley)' 'Lennon, Mr. Denis' 'Leonard, Mr. Lionel'
 'Lester, Mr. James' 'Lesurer, Mr. Gustave J' 'Levy, Mr. Rene Jacques'
 'Lewy, Mr. Ervin G' 'Leyson, Mr. Robert William Norman'
 'Lievens, Mr. Rene Aime' 'Lindahl, Miss. Agda Thorilda Viktoria'
 'Lindblom, Miss. Augusta Charlotta' 'Lindell, Mr. Edvard Bengtsson'
 'Lindqvist, Mr. Eino William' 'Lines, Miss. Mary Conover' 'Ling, Mr. Lee'
 'Lobb, Mr. William Arthur'
 'Lobb, Mrs. William Arthur (Cordelia K Stanlick)'
 'Long, Mr. Milton Clyde' 'Longley, Miss. Gretchen Fiske'
 'Louch, Mrs. Charles Alexander (Alice Adelaide Slow)'
 'Lovell, Mr. John Hall ("Henry")' 'Lulic, Mr. Nikola'
 'Lundahl, Mr. Johan Svensson' 'Lurette, Miss. Elise' 'Mack, Mrs. (Mary)'
 'Madigan, Miss. Margaret "Maggie"' 'Madill, Miss. Georgette Alexandra'
 'Madsen, Mr. Fridtjof Arne' 'Maenpaa, Mr. Matti Alexanteri'
 'Maioni, Miss. Roberta' 'Maisner, Mr. Simon' 'Mallet, Master. Andre'
 'Mallet, Mr. Albert' 'Mamee, Mr. Hanna' 'Mangan, Miss. Mary'
 'Mannion, Miss. Margareth' 'Marechal, Mr. Pierre' 'Markoff, Mr. Marin'
 'Markun, Mr. Johann' 'Marvin, Mr. Daniel Warner'
 'Masselmani, Mrs. Fatima' 'Matthews, Mr. William John'
 'Mayne, Mlle. Berthe Antonine ("Mrs de Villiers")'
 'McCarthy, Mr. Timothy J' 'McCormack, Mr. Thomas Joseph'
 'McCoy, Miss. Agnes' 'McCoy, Mr. Bernard'
 'McDermott, Miss. Brigdet Delia' 'McEvoy, Mr. Michael'
 'McGough, Mr. James Robert' 'McGovern, Miss. Mary'
 'McGowan, Miss. Anna "Annie"' 'McKane, Mr. Peter David'
 'McMahon, Mr. Martin' 'McNamee, Mr. Neal'
 'Meanwell, Miss. (Marion Ogden)'
 'Meek, Mrs. Thomas (Annie Louise Rowley)'
 'Mellinger, Miss. Madeleine Violet'
 'Mellinger, Mrs. (Elizabeth Anne Maidment)' 'Mellors, Mr. William John'
 'Meo, Mr. Alfonzo' 'Mernagh, Mr. Robert' 'Meyer, Mr. August'
 'Meyer, Mr. Edgar Joseph' 'Meyer, Mrs. Edgar Joseph (Leila Saks)'
 'Millet, Mr. Francis Davis' 'Milling, Mr. Jacob Christian'
 'Minahan, Dr. William Edward' 'Minahan, Miss. Daisy E' 'Mineff, Mr. Ivan'
 'Mionoff, Mr. Stoytcho' 'Mitchell, Mr. Henry Michael' 'Mitkoff, Mr. Mito'
 'Mockler, Miss. Helen Mary "Ellie"' 'Moen, Mr. Sigurd Hansen'
 'Molson, Mr. Harry Markland' 'Montvila, Rev. Juozas'
 'Moor, Master. Meier' 'Moor, Mrs. (Beila)' 'Moore, Mr. Leonard Charles'
 'Moran, Miss. Bertha' 'Moran, Mr. Daniel J' 'Moran, Mr. James'
 'Moraweck, Dr. Ernest' 'Morley, Mr. Henry Samuel ("Mr Henry Marshall")'
 'Morley, Mr. William' 'Morrow, Mr. Thomas Rowan' 'Moss, Mr. Albert Johan'
 'Moubarek, Master. Gerios'
 'Moubarek, Master. Halim Gonios ("William George")'
 'Moussa, Mrs. (Mantoura Boulos)' 'Moutal, Mr. Rahamin Haim'
 'Mudd, Mr. Thomas Charles' 'Mullens, Miss. Katherine "Katie"'
 'Murdlin, Mr. Joseph' 'Murphy, Miss. Katherine "Kate"'
 'Murphy, Miss. Margaret Jane' 'Myhrman, Mr. Pehr Fabian Oliver Malkolm'
 'Naidenoff, Mr. Penko' 'Najib, Miss. Adele Kiamie "Jane"'
 'Nakid, Miss. Maria ("Mary")' 'Nakid, Mr. Sahid' 'Nankoff, Mr. Minko'
 'Nasser, Mr. Nicholas' 'Nasser, Mrs. Nicholas (Adele Achem)'
 'Natsch, Mr. Charles H' 'Navratil, Master. Edmond Roger'
 'Navratil, Master. Michel M' 'Navratil, Mr. Michel ("Louis M Hoffman")'
 'Nenkoff, Mr. Christo' 'Newell, Miss. Madeleine' 'Newell, Miss. Marjorie'
 'Newell, Mr. Arthur Webster' 'Newsom, Miss. Helen Monypeny'
 'Nicholls, Mr. Joseph Charles' 'Nicholson, Mr. Arthur Ernest'
 'Nicola-Yarred, Master. Elias' 'Nicola-Yarred, Miss. Jamila'
 'Nilsson, Miss. Helmina Josefina' 'Nirva, Mr. Iisakki Antino Aijo'
 'Niskanen, Mr. Juha' 'Norman, Mr. Robert Douglas'
 'Nosworthy, Mr. Richard Cater' 'Novel, Mr. Mansouer'
 'Nye, Mrs. (Elizabeth Ramell)' 'Nysten, Miss. Anna Sofia'
 'Nysveen, Mr. Johan Hansen' "O'Brien, Mr. Thomas" "O'Brien, Mr. Timothy"
 'O\'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey)'
 "O'Connell, Mr. Patrick D" "O'Connor, Mr. Maurice"
 "O'Driscoll, Miss. Bridget" 'O\'Dwyer, Miss. Ellen "Nellie"'
 'O\'Leary, Miss. Hanora "Norah"' "O'Sullivan, Miss. Bridget Mary"
 'Odahl, Mr. Nils Martin' 'Ohman, Miss. Velin' 'Olsen, Mr. Henry Margido'
 'Olsen, Mr. Karl Siegwart Andreas' 'Olsen, Mr. Ole Martin'
 'Olsson, Miss. Elina' 'Olsson, Mr. Nils Johan Goransson'
 'Olsvigen, Mr. Thor Anderson' 'Oreskovic, Miss. Marija'
 'Oreskovic, Mr. Luka' 'Osen, Mr. Olaf Elon' 'Osman, Mrs. Mara'
 'Ostby, Mr. Engelhart Cornelius' 'Otter, Mr. Richard'
 'Padro y Manent, Mr. Julian' 'Pain, Dr. Alfred'
 'Palsson, Master. Gosta Leonard' 'Palsson, Miss. Stina Viola'
 'Palsson, Miss. Torborg Danira'
 'Palsson, Mrs. Nils (Alma Cornelia Berglund)'
 'Panula, Master. Eino Viljami' 'Panula, Master. Juha Niilo'
 'Panula, Master. Urho Abraham' 'Panula, Mr. Ernesti Arvid'
 'Panula, Mr. Jaako Arnold' 'Panula, Mrs. Juha (Maria Emilia Ojala)'
 'Parkes, Mr. Francis "Frank"' 'Parr, Mr. William Henry Marsh'
 'Parrish, Mrs. (Lutie Davis)' 'Partner, Mr. Austen' 'Pasic, Mr. Jakob'
 'Patchett, Mr. George' 'Paulner, Mr. Uscher' 'Pavlovic, Mr. Stefo'
 'Pears, Mr. Thomas Clinton' 'Pears, Mrs. Thomas (Edith Wearne)'
 'Peduzzi, Mr. Joseph' 'Pekoniemi, Mr. Edvard'
 'Penasco y Castellana, Mr. Victor de Satode'
 'Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)'
 'Pengelly, Mr. Frederick William' 'Perkin, Mr. John Henry'
 'Pernot, Mr. Rene' 'Perreault, Miss. Anne' 'Persson, Mr. Ernst Ulrik'
 'Peter, Miss. Anna' 'Peter, Mrs. Catherine (Catherine Rizk)'
 'Peters, Miss. Katie' 'Petranec, Miss. Matilda' 'Petroff, Mr. Nedelio'
 'Petroff, Mr. Pastcho ("Pentcho")' 'Petterson, Mr. Johan Emil'
 'Pettersson, Miss. Ellen Natalia' 'Peuchen, Major. Arthur Godfrey'
 'Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")'
 'Pickard, Mr. Berk (Berk Trembisky)' 'Pinsky, Mrs. (Rosa)'
 'Plotcharsky, Mr. Vasil' 'Ponesell, Mr. Martin'
 'Porter, Mr. Walter Chamberlain'
 'Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)'
 'Quick, Miss. Phyllis May'
 'Quick, Mrs. Frederick Charles (Jane Richards)' 'Radeff, Mr. Alexander'
 'Razi, Mr. Raihed' 'Reed, Mr. James George' 'Reeves, Mr. David'
 'Rekic, Mr. Tido' 'Renouf, Mr. Peter Henry'
 'Renouf, Mrs. Peter Henry (Lillian Jefferys)'
 'Reuchlin, Jonkheer. John George' 'Reynaldo, Ms. Encarnacion'
 'Rice, Master. Arthur' 'Rice, Master. Eric' 'Rice, Master. Eugene'
 'Rice, Master. George Hugh' 'Rice, Mrs. William (Margaret Norton)'
 'Richard, Mr. Emile' 'Richards, Master. George Sibley'
 'Richards, Master. William Rowe' 'Richards, Mrs. Sidney (Emily Hocking)'
 'Ridsdale, Miss. Lucy' 'Ringhini, Mr. Sante' 'Rintamaki, Mr. Matti'
 'Risien, Mr. Samuel Beard' 'Robbins, Mr. Victor'
 'Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)'
 'Robins, Mrs. Alexander A (Grace Charity Laury)'
 'Roebling, Mr. Washington Augustus II' 'Rogers, Mr. William John'
 'Romaine, Mr. Charles Hallace ("Mr C Rolmane")'
 'Rommetvedt, Mr. Knud Paust' 'Rood, Mr. Hugh Roscoe'
 'Rosblom, Mr. Viktor Richard' 'Rosblom, Mrs. Viktor (Helena Wilhelmina)'
 'Ross, Mr. John Hugo'
 'Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)'
 'Rothschild, Mrs. Martin (Elizabeth L. Barrett)'
 'Rouse, Mr. Richard Henry' 'Rugg, Miss. Emily'
 'Rush, Mr. Alfred George John' 'Ryan, Mr. Patrick'
 'Ryerson, Miss. Emily Borie' 'Ryerson, Miss. Susan Parker "Suzette"'
 'Saad, Mr. Amin' 'Saad, Mr. Khalil' 'Saalfeld, Mr. Adolphe'
 'Sadlier, Mr. Matthew' 'Sage, Master. Thomas Henry'
 'Sage, Miss. Constance Gladys' 'Sage, Miss. Dorothy Edith "Dolly"'
 'Sage, Miss. Stella Anna' 'Sage, Mr. Douglas Bullen'
 'Sage, Mr. Frederick' 'Sage, Mr. George John Jr' 'Sagesser, Mlle. Emma'
 'Salkjelsvik, Miss. Anna Kristine' 'Salonen, Mr. Johan Werner'
 'Samaan, Mr. Youssef' 'Sandstrom, Miss. Marguerite Rut'
 'Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)'
 'Saundercock, Mr. William Henry' 'Sawyer, Mr. Frederick Charles'
 'Scanlan, Mr. James' 'Sdycoff, Mr. Todor'
 'Sedgwick, Mr. Charles Frederick Waddington' 'Serepeca, Miss. Augusta'
 'Seward, Mr. Frederic Kimber' 'Sharp, Mr. Percival James R'
 'Sheerlinck, Mr. Jan Baptist' 'Shellard, Mr. Frederick William'
 'Shelley, Mrs. William (Imanita Parrish Hall)'
 'Shorney, Mr. Charles Joseph' 'Shutes, Miss. Elizabeth W'
 'Silven, Miss. Lyyli Karoliina' 'Silverthorne, Mr. Spencer Victor'
 'Silvey, Mr. William Baird' 'Silvey, Mrs. William Baird (Alice Munger)'
 'Simmons, Mr. John' 'Simonius-Blumer, Col. Oberst Alfons'
 'Sinkkonen, Miss. Anna' 'Sirayanian, Mr. Orsen' 'Sirota, Mr. Maurice'
 'Sivic, Mr. Husein' 'Sivola, Mr. Antti Wilhelm'
 'Sjoblom, Miss. Anna Sofia' 'Sjostedt, Mr. Ernst Adolf'
 'Skoog, Master. Harald' 'Skoog, Master. Karl Thorsten'
 'Skoog, Miss. Mabel' 'Skoog, Miss. Margit Elizabeth' 'Skoog, Mr. Wilhelm'
 'Skoog, Mrs. William (Anna Bernhardina Karlsson)' 'Slabenoff, Mr. Petco'
 'Slayter, Miss. Hilda Mary' 'Slemen, Mr. Richard James'
 'Slocovski, Mr. Selman Francis' 'Sloper, Mr. William Thompson'
 'Smart, Mr. John Montgomery' 'Smiljanic, Mr. Mile'
 'Smith, Miss. Marion Elsie' 'Smith, Mr. James Clinch'
 'Smith, Mr. Richard William' 'Smith, Mr. Thomas'
 'Sobey, Mr. Samuel James Hayden'
 'Soholt, Mr. Peter Andreas Lauritz Andersen'
 'Somerton, Mr. Francis William'
 'Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)'
 'Spencer, Mrs. William Augustus (Marie Eugenie)'
 'Stahelin-Maeglin, Dr. Max' 'Staneff, Mr. Ivan' 'Stankovic, Mr. Ivan'
 'Stanley, Miss. Amy Zillah Elsie' 'Stanley, Mr. Edward Roland'
 'Stead, Mr. William Thomas'
 'Stephenson, Mrs. Walter Bertram (Martha Eustis)' 'Stewart, Mr. Albert A'
 'Stone, Mrs. George Nelson (Martha Evelyn)' 'Stoytcheff, Mr. Ilia'
 'Strandberg, Miss. Ida Sofia' 'Stranden, Mr. Juho'
 'Strom, Miss. Telma Matilda' 'Strom, Mrs. Wilhelm (Elna Matilda Persson)'
 'Sunderland, Mr. Victor Francis' 'Sundman, Mr. Johan Julian'
 'Sutehall, Mr. Henry Jr' 'Sutton, Mr. Frederick' 'Svensson, Mr. Johan'
 'Svensson, Mr. Olof'
 'Swift, Mrs. Frederick Joel (Margaret Welles Barron)'
 'Taussig, Miss. Ruth' 'Taussig, Mr. Emil'
 'Taussig, Mrs. Emil (Tillie Mandelbaum)' 'Taylor, Mr. Elmer Zebley'
 'Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)'
 'Thayer, Mr. John Borland' 'Thayer, Mr. John Borland Jr'
 'Thayer, Mrs. John Borland (Marian Longstreth Morris)'
 'Theobald, Mr. Thomas Leonard' 'Thomas, Master. Assad Alexander'
 'Thorne, Mrs. Gertrude Maybelle' 'Thorneycroft, Mr. Percival'
 'Thorneycroft, Mrs. Percival (Florence Kate White)' 'Tikkanen, Mr. Juho'
 'Tobin, Mr. Roger' 'Todoroff, Mr. Lalio' 'Tomlin, Mr. Ernest Portage'
 'Toomey, Miss. Ellen' 'Torber, Mr. Ernst William'
 'Tornquist, Mr. William Henry' 'Toufik, Mr. Nakli'
 'Touma, Mrs. Darwis (Hanne Youssef Razi)' 'Troupiansky, Mr. Moses Aaron'
 'Trout, Mrs. William H (Jessie L)' 'Troutt, Miss. Edwina Celia "Winnie"'
 'Turcin, Mr. Stjepan' 'Turja, Miss. Anna Sofia' 'Turkula, Mrs. (Hedwig)'
 'Turpin, Mr. William John Robert'
 'Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)'
 'Uruchurtu, Don. Manuel E' 'Van Impe, Miss. Catharina'
 'Van Impe, Mr. Jean Baptiste'
 'Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)'
 'Van der hoef, Mr. Wyckoff' 'Vande Velde, Mr. Johannes Joseph'
 'Vande Walle, Mr. Nestor Cyriel' 'Vanden Steen, Mr. Leo Peter'
 'Vander Cruyssen, Mr. Victor' 'Vander Planke, Miss. Augusta Maria'
 'Vander Planke, Mr. Leo Edmondus'
 'Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)'
 'Vestrom, Miss. Hulda Amanda Adolfina' 'Vovk, Mr. Janko'
 'Waelens, Mr. Achille' 'Walker, Mr. William Anderson' 'Ward, Miss. Anna'
 'Warren, Mrs. Frank Manley (Anna Sophia Atkinson)'
 'Watson, Mr. Ennis Hastings'
 'Watt, Mrs. James (Elizabeth "Bessie" Inglis Milne)'
 'Webber, Miss. Susan' 'Webber, Mr. James' 'Weir, Col. John'
 'Weisz, Mrs. Leopold (Mathilde Francoise Pede)' 'Wells, Miss. Joan'
 'West, Miss. Constance Mirium' 'West, Mr. Edwy Arthur'
 'West, Mrs. Edwy Arthur (Ada Mary Worth)' 'Wheadon, Mr. Edward H'
 'White, Mr. Percival Wayland' 'White, Mr. Richard Frasar'
 'Wick, Miss. Mary Natalie' 'Wick, Mrs. George Dennick (Mary Hitchcock)'
 'Widegren, Mr. Carl/Charles Peter' 'Widener, Mr. Harry Elkins'
 'Wiklund, Mr. Jakob Alfred' 'Wilhelms, Mr. Charles' 'Willey, Mr. Edward'
 'Williams, Mr. Charles Duane' 'Williams, Mr. Charles Eugene'
 'Williams, Mr. Howard Hugh "Harry"' 'Williams, Mr. Leslie'
 'Williams-Lambert, Mr. Fletcher Fellows' 'Windelov, Mr. Einar'
 'Wiseman, Mr. Phillippe' 'Woolner, Mr. Hugh' 'Wright, Mr. George'
 'Yasbeck, Mr. Antoni' 'Yasbeck, Mrs. Antoni (Selini Alexander)'
 'Young, Miss. Marie Grice' 'Youseff, Mr. Gerious' 'Yousif, Mr. Wazli'
 'Yousseff, Mr. Gerious' 'Yrois, Miss. Henriette ("Mrs Harbeck")'
 'Zabour, Miss. Hileni' 'Zabour, Miss. Thamine' 'Zimmerman, Mr. Leo'
 'de Messemaeker, Mrs. Guillaume Joseph (Emma)' 'de Mulder, Mr. Theodore'
 'de Pelsmaeker, Mr. Alfons' 'del Carlo, Mr. Sebastiano'
 'van Billiard, Mr. Austin Blyler' 'van Melkebeke, Mr. Philemon']

colname : Sex, uniq : ['female' 'male']

colname : Ticket, uniq : ['110152' '110413' '110465' '110564' '110813' '111240' '111320' '111361'
 '111369' '111426' '111427' '111428' '112050' '112052' '112053' '112058'
 '112059' '112277' '112379' '113028' '113043' '113050' '113051' '113055'
 '113056' '113059' '113501' '113503' '113505' '113509' '113510' '113514'
 '113572' '113760' '113767' '113773' '113776' '113781' '113783' '113784'
 '113786' '113787' '113788' '113789' '113792' '113794' '113796' '113798'
 '113800' '113803' '113804' '113806' '113807' '11668' '11751' '11752'
 '11753' '11755' '11765' '11767' '11769' '11771' '11774' '11813' '11967'
 '12233' '12460' '12749' '13049' '13213' '13214' '13502' '13507' '13509'
 '13567' '13568' '14311' '14312' '14313' '14973' '1601' '16966' '16988'
 '17421' '17453' '17463' '17464' '17465' '17466' '17474' '17764' '19877'
 '19928' '19943' '19947' '19950' '19952' '19972' '19988' '19996' '2003'
 '211536' '21440' '218629' '219533' '220367' '220845' '2223' '223596'
 '226593' '226875' '228414' '229236' '230080' '230136' '230433' '230434'
 '231919' '231945' '233639' '233866' '234360' '234604' '234686' '234818'
 '236171' '236852' '236853' '237442' '237565' '237668' '237671' '237736'
 '237789' '237798' '239853' '239854' '239855' '239856' '239865' '240929'
 '24160' '243847' '243880' '244252' '244270' '244278' '244310' '244358'
 '244361' '244367' '244373' '248698' '248706' '248723' '248727' '248731'
 '248733' '248738' '248740' '248747' '250643' '250644' '250646' '250647'
 '250648' '250649' '250651' '250652' '250653' '250655' '2620' '2623'
 '2624' '2625' '2626' '2627' '2628' '2629' '2631' '26360' '2641' '2647'
 '2648' '2649' '2650' '2651' '2653' '2659' '2661' '2662' '2663' '2664'
 '2665' '2666' '2667' '2668' '2669' '26707' '2671' '2672' '2674' '2677'
 '2678' '2680' '2683' '2685' '2686' '2687' '2689' '2690' '2691' '2693'
 '2694' '2695' '2697' '2699' '2700' '27042' '27267' '27849' '28134'
 '28206' '28213' '28220' '28228' '28403' '28424' '28425' '28551' '28664'
 '28665' '29011' '2908' '29103' '29104' '29105' '29106' '29108' '2926'
 '29750' '29751' '3101264' '3101265' '3101267' '3101276' '3101277'
 '3101278' '3101281' '3101295' '3101296' '3101298' '31027' '31028'
 '312991' '312992' '312993' '31418' '315037' '315082' '315084' '315086'
 '315088' '315089' '315090' '315093' '315094' '315096' '315097' '315098'
 '315151' '315153' '323592' '323951' '324669' '330877' '330909' '330919'
 '330923' '330931' '330932' '330935' '330958' '330959' '330979' '330980'
 '334912' '335097' '335677' '33638' '336439' '3411' '341826' '34218'
 '342826' '343095' '343120' '343275' '343276' '345364' '345572' '345763'
 '345764' '345765' '345767' '345769' '345770' '345773' '345774' '345777'
 '345778' '345779' '345780' '345781' '345783' '3460' '347054' '347060'
 '347061' '347062' '347063' '347064' '347067' '347068' '347069' '347071'
 '347073' '347074' '347076' '347077' '347078' '347080' '347081' '347082'
 '347083' '347085' '347087' '347088' '347089' '3474' '347464' '347466'
 '347468' '347470' '347742' '347743' '348121' '348123' '348124' '349201'
 '349203' '349204' '349205' '349206' '349207' '349208' '349209' '349210'
 '349212' '349213' '349214' '349215' '349216' '349217' '349218' '349219'
 '349221' '349222' '349223' '349224' '349225' '349227' '349228' '349231'
 '349233' '349234' '349236' '349237' '349239' '349240' '349241' '349242'
 '349243' '349244' '349245' '349246' '349247' '349248' '349249' '349251'
 '349252' '349253' '349254' '349256' '349257' '349909' '349910' '349912'
 '350025' '350026' '350029' '350034' '350035' '350036' '350042' '350043'
 '350046' '350047' '350048' '350050' '350052' '350060' '350404' '350406'
 '350407' '350417' '35273' '35281' '35851' '35852' '358585' '36209'
 '362316' '363291' '363294' '363592' '364498' '364499' '364500' '364506'
 '364511' '364512' '364516' '364846' '364848' '364849' '364850' '364851'
 '365222' '365226' '36568' '367226' '367228' '367229' '367230' '367231'
 '367232' '367655' '368323' '36864' '36865' '36866' '368703' '36928'
 '36947' '36963' '36967' '36973' '370129' '370365' '370369' '370370'
 '370371' '370372' '370373' '370375' '370376' '370377' '371060' '371110'
 '371362' '372622' '373450' '374746' '374887' '374910' '376564' '376566'
 '382649' '382651' '382652' '383121' '384461' '386525' '392091' '392092'
 '392096' '394140' '4133' '4134' '4135' '4136' '4137' '4138' '4579'
 '54636' '5727' '65303' '65304' '65306' '6563' '693' '695' '7267' '7534'
 '7540' '7545' '7546' '7552' '7553' '7598' '8471' '8475' '9234'
 'A./5. 2152' 'A./5. 3235' 'A.5. 11206' 'A.5. 18509' 'A/4 45380'
 'A/4 48871' 'A/4. 20589' 'A/4. 34244' 'A/4. 39886' 'A/5 21171'
 'A/5 21172' 'A/5 21173' 'A/5 21174' 'A/5 2466' 'A/5 2817' 'A/5 3536'
 'A/5 3540' 'A/5 3594' 'A/5 3902' 'A/5. 10482' 'A/5. 13032' 'A/5. 2151'
 'A/5. 3336' 'A/5. 3337' 'A/5. 851' 'A/S 2816' 'A4. 54510' 'C 17369'
 'C 4001' 'C 7075' 'C 7076' 'C 7077' 'C.A. 17248' 'C.A. 18723' 'C.A. 2315'
 'C.A. 24579' 'C.A. 24580' 'C.A. 2673' 'C.A. 29178' 'C.A. 29395'
 'C.A. 29566' 'C.A. 31026' 'C.A. 31921' 'C.A. 33111' 'C.A. 33112'
 'C.A. 33595' 'C.A. 34260' 'C.A. 34651' 'C.A. 37671' 'C.A. 5547'
 'C.A. 6212' 'C.A./SOTON 34068' 'CA 2144' 'CA. 2314' 'CA. 2343'
 'F.C. 12750' 'F.C.C. 13528' 'F.C.C. 13529' 'F.C.C. 13531' 'Fa 265302'
 'LINE' 'P/PP 3381' 'PC 17318' 'PC 17473' 'PC 17474' 'PC 17475' 'PC 17476'
 'PC 17477' 'PC 17482' 'PC 17483' 'PC 17485' 'PC 17558' 'PC 17569'
 'PC 17572' 'PC 17582' 'PC 17585' 'PC 17590' 'PC 17592' 'PC 17593'
 'PC 17595' 'PC 17596' 'PC 17597' 'PC 17599' 'PC 17600' 'PC 17601'
 'PC 17603' 'PC 17604' 'PC 17605' 'PC 17608' 'PC 17609' 'PC 17610'
 'PC 17611' 'PC 17612' 'PC 17754' 'PC 17755' 'PC 17756' 'PC 17757'
 'PC 17758' 'PC 17759' 'PC 17760' 'PC 17761' 'PP 4348' 'PP 9549'
 'S.C./A.4. 23567' 'S.C./PARIS 2079' 'S.O./P.P. 3' 'S.O./P.P. 751'
 'S.O.C. 14879' 'S.O.P. 1166' 'S.P. 3464' 'S.W./PP 752' 'SC 1748'
 'SC/AH 29037' 'SC/AH 3085' 'SC/AH Basle 541' 'SC/PARIS 2131'
 'SC/PARIS 2133' 'SC/PARIS 2146' 'SC/PARIS 2149' 'SC/PARIS 2167'
 'SC/Paris 2123' 'SC/Paris 2163' 'SCO/W 1585' 'SO/C 14885'
 'SOTON/O.Q. 3101305' 'SOTON/O.Q. 3101306' 'SOTON/O.Q. 3101307'
 'SOTON/O.Q. 3101310' 'SOTON/O.Q. 3101311' 'SOTON/O.Q. 3101312'
 'SOTON/O.Q. 392078' 'SOTON/O.Q. 392087' 'SOTON/O2 3101272'
 'SOTON/O2 3101287' 'SOTON/OQ 3101316' 'SOTON/OQ 3101317'
 'SOTON/OQ 392076' 'SOTON/OQ 392082' 'SOTON/OQ 392086' 'SOTON/OQ 392089'
 'SOTON/OQ 392090' 'STON/O 2. 3101269' 'STON/O 2. 3101273'
 'STON/O 2. 3101274' 'STON/O 2. 3101275' 'STON/O 2. 3101280'
 'STON/O 2. 3101285' 'STON/O 2. 3101286' 'STON/O 2. 3101288'
 'STON/O 2. 3101289' 'STON/O 2. 3101292' 'STON/O 2. 3101293'
 'STON/O 2. 3101294' 'STON/O2. 3101271' 'STON/O2. 3101279'
 'STON/O2. 3101282' 'STON/O2. 3101283' 'STON/O2. 3101290' 'SW/PP 751'
 'W./C. 14258' 'W./C. 14263' 'W./C. 6607' 'W./C. 6608' 'W./C. 6609'
 'W.E.P. 5734' 'W/C 14208' 'WE/P 5735']

colname : Cabin, uniq : ['A10' 'A14' 'A16' 'A19' 'A20' 'A23' 'A24' 'A26' 'A31' 'A32' 'A34' 'A36'
 'A5' 'A6' 'A7' 'B101' 'B102' 'B18' 'B19' 'B20' 'B22' 'B28' 'B3' 'B30'
 'B35' 'B37' 'B38' 'B39' 'B4' 'B41' 'B42' 'B49' 'B5' 'B50' 'B51 B53 B55'
 'B57 B59 B63 B66' 'B58 B60' 'B69' 'B71' 'B73' 'B77' 'B78' 'B79' 'B80'
 'B82 B84' 'B86' 'B94' 'B96 B98' 'C101' 'C103' 'C104' 'C106' 'C110' 'C111'
 'C118' 'C123' 'C124' 'C125' 'C126' 'C128' 'C148' 'C2' 'C22 C26'
 'C23 C25 C27' 'C30' 'C32' 'C45' 'C46' 'C47' 'C49' 'C50' 'C52' 'C54'
 'C62 C64' 'C65' 'C68' 'C7' 'C70' 'C78' 'C82' 'C83' 'C85' 'C86' 'C87'
 'C90' 'C91' 'C92' 'C93' 'C95' 'C99' 'D' 'D10 D12' 'D11' 'D15' 'D17' 'D19'
 'D20' 'D21' 'D26' 'D28' 'D30' 'D33' 'D35' 'D36' 'D37' 'D45' 'D46' 'D47'
 'D48' 'D49' 'D50' 'D56' 'D6' 'D7' 'D9' 'E10' 'E101' 'E12' 'E121' 'E17'
 'E24' 'E25' 'E31' 'E33' 'E34' 'E36' 'E38' 'E40' 'E44' 'E46' 'E49' 'E50'
 'E58' 'E63' 'E67' 'E68' 'E77' 'E8' 'F E69' 'F G63' 'F G73' 'F2' 'F33'
 'F38' 'F4' 'G6' 'T' 'nan']

colname : Embarked, uniq : ['C' 'Q' 'S' 'nan']

EDA - 시각화

In [13]:
sns.set_style('whitegrid')   # seaborn 스타일 지정
sns.countplot(x='Survived', data=train)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f457592370>
In [14]:
## 해보기 : PClass 별 Count
sns.countplot(x='Pclass', data=train)
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f4575b4940>
In [15]:
sns.set_style('whitegrid')   # seaborn 스타일 지정
sns.barplot(x='Pclass', y='Survived', data=train)
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1f457ef4bb0>
In [16]:
sns.distplot(train['Age'].dropna(), bins=30).set_xlim(0,)
Out[16]:
(0.0, 96.421405713925)
In [17]:
## 해보기 Fare
sns.distplot(test['Fare'].dropna(), bins=30).set_xlim(0,)
Out[17]:
(0.0, 556.2418231843072)
In [18]:
f,ax=plt.subplots(1,2,figsize=(18,8))

# 첫번째 그래프
sns.distplot(train['Age'].dropna(), bins=30, ax=ax[0])
ax[0].set_title('train - Age')

# 두번째 그래프 
sns.distplot(test['Age'].dropna(), bins=30, ax=ax[1])
ax[1].set_title('test - Age')
plt.show()

결측치 처리

  • 나이는 평균값
  • 결측치 채우기 [].fillna(값)
In [19]:
train['Age'] = train['Age'].fillna(train['Age'].mean())
test['Age'] = test['Age'].fillna(test['Age'].mean())

## 해보기 
test['Fare'] = test['Fare'].fillna(test['Fare'].mean())

print(train.isnull().sum())
print(test.isnull().sum())
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64
PassengerId      0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          327
Embarked         0
dtype: int64

결측치 처리 - Embarked(승선항)

  • 가장 많이 나온 값으로 결측치 처리를 하자
In [21]:
val_mode = train['Embarked'].mode()
print(val_mode[0])
train['Embarked'] = train['Embarked'].fillna(val_mode[0])
S
In [22]:
print(train.isnull().sum())
print(test.isnull().sum())
PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         0
dtype: int64
PassengerId      0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          327
Embarked         0
dtype: int64

라벨 인코딩 및 자료형 변환

In [23]:
train['Sex'] = train['Sex'].map( {'female': 0, 'male': 1} ).astype(int)
test['Sex'] = test['Sex'].map( {'female': 0, 'male': 1} ).astype(int)

train['Embarked'] = train['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)
test['Embarked']= test['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)
In [24]:
## 나이에 대한 int 처리
train['Age'] = train['Age'].astype('int')
test['Age'] = test['Age'].astype('int')

데이터 나누기

In [25]:
sel = ['PassengerId', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Sex', 'Embarked']

all_X = train[sel]
all_y = train['Survived']

last_X_test = test[sel]
In [26]:
from sklearn.model_selection import train_test_split
In [27]:
X_train, X_test, y_train, y_test = train_test_split(all_X, 
                                                    all_y,    
                                                    stratify=all_y,
                                                    test_size=0.3,
                                                    random_state=77 )
In [28]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.neighbors import KNeighborsClassifier
In [30]:
model = [DecisionTreeClassifier(), LogisticRegression(), LinearSVC(), KNeighborsClassifier()]

for model_one in model:
    model = model_one
    model.fit(X_train, y_train)
    acc_tr = model.score(X_train, y_train)
    acc_test = model.score(X_test, y_test)
    print("모델명 : {}, 정확도 {} {} ".format(model, acc_tr, acc_test) )
모델명 : DecisionTreeClassifier(), 정확도 1.0 0.75 
모델명 : LogisticRegression(), 정확도 0.7817014446227929 0.8059701492537313 
모델명 : LinearSVC(), 정확도 0.4911717495987159 0.4925373134328358 
모델명 : KNeighborsClassifier(), 정확도 0.7335473515248796 0.6268656716417911 
C:\Users\WJ\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
C:\Users\WJ\anaconda3\lib\site-packages\sklearn\svm\_base.py:976: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn("Liblinear failed to converge, increase "

실습 Logistic 모델을 이용하여 최종 모델을 만들고 제출해 보자.

In [31]:
model = LogisticRegression()
model.fit(all_X, all_y)
pred = model.predict(last_X_test)
sub['Survived'] = pred
sub.to_csv("four_lgreg_sub.csv", index=False) 
C:\Users\WJ\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
In [33]:
import os
files = os.listdir()
print("파일 유무 확인 : ", "four_lgreg_sub.csv" in files)  # 0.75837
파일 유무 확인 :  True

실습

  • LogisticRegression 또는 DecisionTree 모델을 선택하여, 파라미터 튜닝을 한 후, 최적의 모델을 선택 후, 이를 제출해 보자.