admin 管理员组

文章数量: 887007

pandas 读取包含多个字典的txt数据文件

pandas 读取包含多个字典的txt数据文件

爬虫得到的IMDB电影数据文件,是包含多个字典的txt数据文件,如下:

{“movie_id”: 111161, “movie_name”: “The Shawshank Redemption”, “year”: 1994, “movie_link”: “/title/tt0111161/”, “movie_rate”: 9.222352298932535}
{“movie_id”: 68646, “movie_name”: “The Godfather”, “year”: 1972, “movie_link”: “/title/tt0068646/”, “movie_rate”: 9.149459830968244}
{“movie_id”: 71562, “movie_name”: “The Godfather: Part II”, “year”: 1974, “movie_link”: “/title/tt0071562/”, “movie_rate”: 8.982003465767905}
{“movie_id”: 468569, “movie_name”: “The Dark Knight”, “year”: 2008, “movie_link”: “/title/tt0468569/”, “movie_rate”: 8.969818219072174}
{“movie_id”: 50083, “movie_name”: “12 Angry Men”, “year”: 1957, “movie_link”: “/title/tt0050083/”, “movie_rate”: 8.92419107925773}
{“movie_id”: 108052, “movie_name”: “Schindler’s List”, “year”: 1993, “movie_link”: “/title/tt0108052/”, “movie_rate”: 8.903202955255063}
{“movie_id”: 167260, “movie_name”: “The Lord of the Rings: The Return of the King”, “year”: 2003, “movie_link”: “/title/tt0167260/”, “movie_rate”: 8.880342018614696}
{“movie_id”: 110912, “movie_name”: “Pulp Fiction”, “year”: 1994, “movie_link”: “/title/tt0110912/”, “movie_rate”: 8.85114611887013}
{“movie_id”: 60196, “movie_name”: “Il buono, il brutto, il cattivo”, “year”: 1966, “movie_link”: “/title/tt0060196/”, “movie_rate”: 8.801450831661567}
……

#!/usr/bin/python 
# -*- coding: utf-8 -*-#用pandas读取后,也就是这个样子
import pandas as pd
pd.set_option('max_colwidth',150) #解决列显示不全,设置value的显示长度为150,默认为50data = pd.read_csv("movie12.txt",sep = "\t",header=None) #得到数组,header=None表示文件没有表头
data.head()
0
0{"movie_id": 111161, "movie_name": "The Shawshank Redemption", "year": 1994, "movie_link": "/title/tt0111161/", "movie_rate": 9.222352298932535}
1{"movie_id": 68646, "movie_name": "The Godfather", "year": 1972, "movie_link": "/title/tt0068646/", "movie_rate": 9.149459830968244}
2{"movie_id": 71562, "movie_name": "The Godfather: Part II", "year": 1974, "movie_link": "/title/tt0071562/", "movie_rate": 8.982003465767905}
3{"movie_id": 468569, "movie_name": "The Dark Knight", "year": 2008, "movie_link": "/title/tt0468569/", "movie_rate": 8.969818219072174}
4{"movie_id": 50083, "movie_name": "12 Angry Men", "year": 1957, "movie_link": "/title/tt0050083/", "movie_rate": 8.92419107925773}

我们希望的格式是这样的,便于python执行下一步统计分析

data = pd.read_csv("movie12.csv")
data.head()
Unnamed: 0movie_idmovie_nameyearmovie_linkmovie_rate
00111161The Shawshank Redemption1994/title/tt0111161/9.222352
1168646The Godfather1972/title/tt0068646/9.149460
2271562The Godfather: Part II1974/title/tt0071562/8.982003
33468569The Dark Knight2008/title/tt0468569/8.969818
445008312 Angry Men1957/title/tt0050083/8.924191

#直接上代码

#!/usr/bin/python 
# -*- coding: utf-8 -*-import pandas as pd
f = open("movie12.txt",'r') #打开文件
lines = f.readlines() #逐行读取,成为列表,但里面包含了一些冗余字符
str_lines = str(lines).replace("'","").replace(r"\n","") #转字符串,便于删除冗余字符
list_dict = eval(str_lines) #从字符串转回包含字典的列表[{'Key1':'Value1_1','Key2':'Value2_1','Key3':'Value3_1',……},{},{},{}]
df = pd.DataFrame(list_dict)
df.to_csv("movie12out.csv",encoding = "utf-8-sig")
df.head()
movie_idmovie_nameyearmovie_linkmovie_rate
0111161The Shawshank Redemption1994/title/tt0111161/9.222352
168646The Godfather1972/title/tt0068646/9.149460
271562The Godfather: Part II1974/title/tt0071562/8.982003
3468569The Dark Knight2008/title/tt0468569/8.969818
45008312 Angry Men1957/title/tt0050083/8.924191

pandas读取类似字典格式的txt文件

有时候,我们得到数据文件是这样的格式,txt格式,所有的列名和数值是纵向排列。

它类似与字典格式,键与值之间以:\t相隔,但缺少了{}""

movie_id: 111161
movie_name: The Shawshank Redemption
year: 1994
movie_link: /title/tt0111161/
movie_rate: 9.222352298932535}
movie_id: 68646
movie_name: The Godfather
year: 1972
movie_link: /title/tt0068646/
movie_rate: 9.149459830968244}
movie_id: 71562
movie_name: The Godfather: Part II
year: 1974
movie_link: /title/tt0071562/
movie_rate: 8.982003465767905}
movie_id: 468569
movie_name: The Dark Knight
year: 2008
movie_link: /title/tt0468569/
movie_rate: 8.969818219072174}
movie_id: 50083
movie_name: 12 Angry Men
year: 1957
movie_link: /title/tt0050083/
movie_rate: 8.92419107925773}
……

#!/usr/bin/python 
# -*- coding: utf-8 -*-#pandas读取文件格式如下:
import pandas as pd
data = pd.read_csv("movie13.txt",sep = "\t",header=None) #得到数组,header=None表示文件没有表头
#data = pd.read_clipboard(sep = "\t",header=None)
data.head()
0
0movie_id: 111161
1movie_name: The Shawshank Redemption
2year: 1994
3movie_link: /title/tt0111161/
4movie_rate: 9.222352298932535}

我们要改为下面这样的格式,才能进行统计分析

data = pd.read_csv("movie13out.txt") #得到数组
data.head()
111161The Shawshank Redemption1994/title/tt0111161/9.222352298932535}
068646The Godfather1972/title/tt0068646/9.149459830968244}
171562The Godfather1974/title/tt0071562/8.982003465767905}
2468569The Dark Knight2008/title/tt0468569/8.969818219072174}
35008312 Angry Men1957/title/tt0050083/8.92419107925773}
4108052Schindler\\s List1993/title/tt0108052/8.903202955255063}

基本思路:建立空字典——打开文件——逐行读取——分割数值——写入键值——读取键名列表——建立DataFrame——保存数据

具体代码如下

#coding:utf8
#import sys
#from pandas import DataFrame
import pandas as pd#建立字典,键和值,从文件里自动读取。
dict_data={}#打开文件
with open('movie13.txt','r')as df:#读每一行for line in df:#如果这行是换行符就跳过,这里用'\n'的长度来找空行if line.count('\n') == len(line):continue#清除每行前后空格,并以":"分割数值for kv in [line.strip().split(':')]:#按照键,把值写进去dict_data.setdefault(kv[0],[]).append(kv[1])
print(dict_data)
{'movie_id': [' 111161', ' 68646', ' 71562', ' 468569', ' 50083', ' 108052', ' 167260', ' 110912', ' 60196', ' 120737', ' 137523', ' 109830', ' 1375666', ' 80684', ' 167261', ' 133093', ' 99685', ' 73486', ' 47478', ' 114369', ' 317248', ' 6751668', ' 118799', ' 102926', ' 38650', ' 76759', ' 120815', ' 245429', ' 120689', ' 816692', ' 110413', ' 114814', ' 56058', ' 110357', ' 120586', ' 103064', ' 88763', ' 253474', ' 27977', ' 54215', ' 172495', ' 21749', ' 1675434', ' 407887', ' 2582802', ' 7286456', ' 482571', ' 64116', ' 34583', ' 95327', ' 47396', ' 95765', ' 78748', ' 78788', ' 82971', ' 209144', ' 32553', ' 8579674', ' 405094', ' 1853728', ' 4154756', ' 50825', ' 81505', ' 910970', ' 4633694', ' 4154796', ' 119698', ' 43014', ' 57012', ' 364569', ' 51201', ' 1345836', ' 87843', ' 90605', ' 5311514', ' 169547', ' 2380307', ' 112573', ' 82096', ' 1187043', ' 114709', ' 57565', ' 86190', ' 986264', ' 86879', ' 105236', ' 361748', ' 119217', ' 62622', ' 180093', ' 22100', ' 52357', ' 5074352', ' 338013', ' 33467', ' 2106476', ' 93058', ' 53125', ' 66921', ' 208092', ' 12349', ' 40522', ' 45152', ' 86250', ' 75314', ' 211915', ' 56172', ' 435761', ' 70735', ' 8267604', ' 17136', ' 59578', ' 44741', ' 1832382', ' 36775', ' 56592', ' 53604', ' 97576', ' 1255953', ' 1049413', ' 119488', ' 71853', ' 113277', ' 95016', ' 42876', ' 55630', ' 372784', ' 105695', ' 363163', ' 6966692', ' 118849', ' 53291', ' 347149', ' 91251', ' 89881', ' 57115', ' 42192', ' 268978', ' 96283', ' 112641', ' 457430', ' 81398', ' 1305806', ' 120735', ' 993846', ' 40897', ' 55031', ' 5027774', ' 469494', ' 71315', ' 476735', ' 15864', ' 46912', ' 434409', ' 2096673', ' 50976', ' 1291584', ' 477348', ' 117951', ' 80678', ' 1130884', ' 167404', ' 3170832', ' 84787', ' 31381', ' 83658', ' 107290', ' 50212', ' 41959', ' 47296', ' 266543', ' 50986', ' 116282', ' 266697', ' 1205489', ' 77416', ' 79944', ' 120382', ' 46438', ' 1950186', ' 116231', ' 3011894', ' 8108198', ' 353969', ' 118715', ' 978762', ' 107207', ' 2267998', ' 4729430', ' 2119532', ' 2278388', ' 31679', ' 892769', ' 17925', ' 15324', ' 112471', ' 60827', ' 264464', ' 1392214', ' 2024544', ' 61512', ' 758758', ' 46268', ' 74958', ' 92005', ' 1392190', ' 79470', ' 72684', ' 91763', ' 405159', ' 1028532', ' 19254', ' 1979320', ' 3315342', ' 52618', ' 405508', ' 87544', ' 60107', ' 1201607', ' 97165', ' 53198', ' 245712', ' 395169', ' 1895587', ' 32976', ' 4016934', ' 75148', ' 198781', ' 113247', ' 25316', ' 43338', ' 40725', ' 93779', ' 42041', ' 381681', ' 118694', ' 169102', ' 1954470', ' 1454029', ' 88247', ' 64115', ' 87884', ' 94625', ' 2338151', ' 103639', ' 4857264', ' 2015381', ' 107048', ' 10323', ' 169858'], 'movie_name': [' The Shawshank Redemption', ' The Godfather', ' The Godfather', ' The Dark Knight', ' 12 Angry Men', ' Schindler\\\\s List', ' The Lord of the Rings', ' Pulp Fiction', ' Il buono il brutto il cattivo', ' The Lord of the Rings', ' Fight Club', ' Forrest Gump', ' Inception', ' Star Wars', ' The Lord of the Rings', ' The Matrix', ' Goodfellas', ' One Flew Over the Cuckoo\\\\s Nest', ' Shichinin no samurai', ' Se7en', ' Cidade de Deus', ' Gisaengchung', ' La vita 猫 bella', ' The Silence of the Lambs', ' It\\\\s a Wonderful Life', ' Star Wars', ' Saving Private Ryan', ' Sen to Chihiro no kamikakushi', ' The Green Mile', ' Interstellar', ' L茅on', ' The Usual Suspects', ' Seppuku', ' The Lion King', ' American History X', ' Terminator 2', ' Back to the Future', ' The Pianist', ' Modern Times', ' Psycho', ' Gladiator', ' City Lights', ' The Intouchables', ' The Departed', ' Whiplash', ' Joker', ' The Prestige', ' Once Upon a Time in the West', ' Casablanca', ' Hotaru no haka', ' Rear Window', ' Nuovo Cinema Paradiso', ' Alien', ' Apocalypse Now', ' Raiders of the Lost Ark', ' Memento', ' The Great Dictator', ' 1917', ' The Lives of Others', ' Django Unchained', ' Avengers', ' Paths of Glory', ' The Shining', ' WALL路E', ' Spider-Man', ' Avengers', ' Mononoke-hime', ' Sunset Blvd.', ' Dr. Strangelove or', ' Oldeuboi', ' Witness for the Prosecution', ' The Dark Knight Rises', ' Once Upon a Time in America', ' Aliens', ' Kimi no na wa.', ' American Beauty', ' Coco', ' Braveheart', ' Das Boot', ' 3 Idiots', ' Toy Story', ' Tengoku to jigoku', ' Star Wars', ' Taare Zameen Par', ' Amadeus', ' Reservoir Dogs', ' Inglourious Basterds', ' Good Will Hunting', ' 2001', ' Requiem for a Dream', ' M - Eine Stadt sucht einen M枚rder', ' Vertigo', ' Dangal', ' Eternal Sunshine of the Spotless Mind', ' Citizen Kane', ' Jagten', ' Full Metal Jacket', ' North by Northwest', ' A Clockwork Orange', ' Snatch', ' The Kid', ' Ladri di biciclette', ' Singin\\\\ in the Rain', ' Scarface', ' Taxi Driver', ' Am茅lie', ' Lawrence of Arabia', ' Toy Story 3', ' The Sting', ' Capharna眉m', ' Metropolis', ' Per qualche dollaro in pi霉', ' Ikiru', ' Jodaeiye Nader az Simin', ' Double Indemnity', ' To Kill a Mockingbird', ' The Apartment', ' Indiana Jones and the Last Crusade', ' Incendies', ' Up', ' L.A. Confidential', ' Monty Python and the Holy Grail', ' Heat', ' Die Hard', ' Rash么mon', ' Y么jinb么', ' Batman Begins', ' Unforgiven', ' Der Untergang', ' Green Book', ' Bacheha-Ye aseman', ' Some Like It Hot', ' Hauru no ugoku shiro', ' Idi i smotri', ' Ran', ' The Great Escape', ' All About Eve', ' A Beautiful Mind', ' Tonari no Totoro', ' Casino', ' Pan\\\\s Labyrinth', ' Raging Bull', ' El secreto de sus ojos', ' Lock Stock and Two Smoking Barrels', ' The Wolf of Wall Street', ' The Treasure of the Sierra Madre', ' Judgment at Nuremberg', ' Three Billboards Outside Ebbing Missouri', ' There Will Be Blood', ' Chinatown', ' Babam ve Oglum', ' The Gold Rush', ' Dial M for Murder', ' V for Vendetta', ' Inside Out', ' Det sjunde inseglet', ' Warrior', ' No Country for Old Men', ' Trainspotting', ' The Elephant Man', ' Shutter Island', ' The Sixth Sense', ' Room', ' The Thing', ' Gone with the Wind', ' Blade Runner', ' Jurassic Park', ' The Bridge on the River Kwai', ' The Third Man', ' On the Waterfront', ' Finding Nemo', ' Smultronst盲llet', ' Fargo', ' Kill Bill', ' Gran Torino', ' The Deer Hunter', ' Stalker', ' The Truman Show', ' T么ky么 monogatari', ' Ford v Ferrari', ' Eskiya', ' Relatos salvajes', ' Andhadhun', ' Salinui chueok', ' The Big Lebowski', ' Mary and Max', ' In the Name of the Father', ' Gone Girl', ' Klaus', ' Hacksaw Ridge', ' The Grand Budapest Hotel', ' Mr. Smith Goes to Washington', ' How to Train Your Dragon', ' The General', ' Sherlock Jr.', ' Before Sunrise', ' Persona', ' Catch Me If You Can', ' Prisoners', ' 12 Years a Slave', ' Cool Hand Luke', ' Into the Wild', ' Le salaire de la peur', ' Network', ' Stand by Me', ' Mad Max', ' Life of Brian', ' Barry Lyndon', ' Platoon', ' Million Dollar Baby', ' Hachi', ' La passion de Jeanne d\\\\Arc', ' Rush', ' Logan', ' Ben-Hur', ' Rang De Basanti', ' Kaze no tani no Naushika', ' Andrei Rublev', ' Harry Potter and the Deathly Hallows', ' Dead Poets Society', ' Les quatre cents coups', ' Amores perros', ' Hotel Rwanda', ' Spotlight', ' Rebecca', ' Ah-ga-ssi', ' Rocky', ' Monsters Inc.', ' La haine', ' It Happened One Night', ' Ace in the Hole', ' The Red Shoes', ' The Princess Bride', ' White Heat', ' Before Sunset', ' Faa yeung nin wa', ' Lagaan', ' Gangs of Wasseypur', ' The Help', ' The Terminator', ' Butch Cassidy and the Sundance Kid', ' Paris Texas', ' Akira', ' PK', ' Aladdin', ' Contratiempo', ' Guardians of the Galaxy', ' Groundhog Day', ' Das Cabinet des Dr. Caligari', ' Shin seiki Evangelion Gekij么-ban'], 'year': [' 1994', ' 1972', ' 1974', ' 2008', ' 1957', ' 1993', ' 2003', ' 1994', ' 1966', ' 2001', ' 1999', ' 1994', ' 2010', ' 1980', ' 2002', ' 1999', ' 1990', ' 1975', ' 1954', ' 1995', ' 2002', ' 2019', ' 1997', ' 1991', ' 1946', ' 1977', ' 1998', ' 2001', ' 1999', ' 2014', ' 1994', ' 1995', ' 1962', ' 1994', ' 1998', ' 1991', ' 1985', ' 2002', ' 1936', ' 1960', ' 2000', ' 1931', ' 2011', ' 2006', ' 2014', ' 2019', ' 2006', ' 1968', ' 1942', ' 1988', ' 1954', ' 1988', ' 1979', ' 1979', ' 1981', ' 2000', ' 1940', ' 2019', ' 2006', ' 2012', ' 2018', ' 1957', ' 1980', ' 2008', ' 2018', ' 2019', ' 1997', ' 1950', ' 1964', ' 2003', ' 1957', ' 2012', ' 1984', ' 1986', ' 2016', ' 1999', ' 2017', ' 1995', ' 1981', ' 2009', ' 1995', ' 1963', ' 1983', ' 2007', ' 1984', ' 1992', ' 2009', ' 1997', ' 1968', ' 2000', ' 1931', ' 1958', ' 2016', ' 2004', ' 1941', ' 2012', ' 1987', ' 1959', ' 1971', ' 2000', ' 1921', ' 1948', ' 1952', ' 1983', ' 1976', ' 2001', ' 1962', ' 2010', ' 1973', ' 2018', ' 1927', ' 1965', ' 1952', ' 2011', ' 1944', ' 1962', ' 1960', ' 1989', ' 2010', ' 2009', ' 1997', ' 1975', ' 1995', ' 1988', ' 1950', ' 1961', ' 2005', ' 1992', ' 2004', ' 2018', ' 1997', ' 1959', ' 2004', ' 1985', ' 1985', ' 1963', ' 1950', ' 2001', ' 1988', ' 1995', ' 2006', ' 1980', ' 2009', ' 1998', ' 2013', ' 1948', ' 1961', ' 2017', ' 2007', ' 1974', ' 2005', ' 1925', ' 1954', ' 2005', ' 2015', ' 1957', ' 2011', ' 2007', ' 1996', ' 1980', ' 2010', ' 1999', ' 2015', ' 1982', ' 1939', ' 1982', ' 1993', ' 1957', ' 1949', ' 1954', ' 2003', ' 1957', ' 1996', ' 2003', ' 2008', ' 1978', ' 1979', ' 1998', ' 1953', ' 2019', ' 1996', ' 2014', ' 2018', ' 2003', ' 1998', ' 2009', ' 1993', ' 2014', ' 2019', ' 2016', ' 2014', ' 1939', ' 2010', ' 1926', ' 1924', ' 1995', ' 1966', ' 2002', ' 2013', ' 2013', ' 1967', ' 2007', ' 1953', ' 1976', ' 1986', ' 2015', ' 1979', ' 1975', ' 1986', ' 2004', ' 2009', ' 1928', ' 2013', ' 2017', ' 1959', ' 2006', ' 1984', ' 1966', ' 2011', ' 1989', ' 1959', ' 2000', ' 2004', ' 2015', ' 1940', ' 2016', ' 1976', ' 2001', ' 1995', ' 1934', ' 1951', ' 1948', ' 1987', ' 1949', ' 2004', ' 2000', ' 2001', ' 2012', ' 2011', ' 1984', ' 1969', ' 1984', ' 1988', ' 2014', ' 1992', ' 2016', ' 2014', ' 1993', ' 1920', ' 1997'], 'movie_link': [' /title/tt0111161/', ' /title/tt0068646/', ' /title/tt0071562/', ' /title/tt0468569/', ' /title/tt0050083/', ' /title/tt0108052/', ' /title/tt0167260/', ' /title/tt0110912/', ' /title/tt0060196/', ' /title/tt0120737/', ' /title/tt0137523/', ' /title/tt0109830/', ' /title/tt1375666/', ' /title/tt0080684/', ' /title/tt0167261/', ' /title/tt0133093/', ' /title/tt0099685/', ' /title/tt0073486/', ' /title/tt0047478/', ' /title/tt0114369/', ' /title/tt0317248/', ' /title/tt6751668/', ' /title/tt0118799/', ' /title/tt0102926/', ' /title/tt0038650/', ' /title/tt0076759/', ' /title/tt0120815/', ' /title/tt0245429/', ' /title/tt0120689/', ' /title/tt0816692/', ' /title/tt0110413/', ' /title/tt0114814/', ' /title/tt0056058/', ' /title/tt0110357/', ' /title/tt0120586/', ' /title/tt0103064/', ' /title/tt0088763/', ' /title/tt0253474/', ' /title/tt0027977/', ' /title/tt0054215/', ' /title/tt0172495/', ' /title/tt0021749/', ' /title/tt1675434/', ' /title/tt0407887/', ' /title/tt2582802/', ' /title/tt7286456/', ' /title/tt0482571/', ' /title/tt0064116/', ' /title/tt0034583/', ' /title/tt0095327/', ' /title/tt0047396/', ' /title/tt0095765/', ' /title/tt0078748/', ' /title/tt0078788/', ' /title/tt0082971/', ' /title/tt0209144/', ' /title/tt0032553/', ' /title/tt8579674/', ' /title/tt0405094/', ' /title/tt1853728/', ' /title/tt4154756/', ' /title/tt0050825/', ' /title/tt0081505/', ' /title/tt0910970/', ' /title/tt4633694/', ' /title/tt4154796/', ' /title/tt0119698/', ' /title/tt0043014/', ' /title/tt0057012/', ' /title/tt0364569/', ' /title/tt0051201/', ' /title/tt1345836/', ' /title/tt0087843/', ' /title/tt0090605/', ' /title/tt5311514/', ' /title/tt0169547/', ' /title/tt2380307/', ' /title/tt0112573/', ' /title/tt0082096/', ' /title/tt1187043/', ' /title/tt0114709/', ' /title/tt0057565/', ' /title/tt0086190/', ' /title/tt0986264/', ' /title/tt0086879/', ' /title/tt0105236/', ' /title/tt0361748/', ' /title/tt0119217/', ' /title/tt0062622/', ' /title/tt0180093/', ' /title/tt0022100/', ' /title/tt0052357/', ' /title/tt5074352/', ' /title/tt0338013/', ' /title/tt0033467/', ' /title/tt2106476/', ' /title/tt0093058/', ' /title/tt0053125/', ' /title/tt0066921/', ' /title/tt0208092/', ' /title/tt0012349/', ' /title/tt0040522/', ' /title/tt0045152/', ' /title/tt0086250/', ' /title/tt0075314/', ' /title/tt0211915/', ' /title/tt0056172/', ' /title/tt0435761/', ' /title/tt0070735/', ' /title/tt8267604/', ' /title/tt0017136/', ' /title/tt0059578/', ' /title/tt0044741/', ' /title/tt1832382/', ' /title/tt0036775/', ' /title/tt0056592/', ' /title/tt0053604/', ' /title/tt0097576/', ' /title/tt1255953/', ' /title/tt1049413/', ' /title/tt0119488/', ' /title/tt0071853/', ' /title/tt0113277/', ' /title/tt0095016/', ' /title/tt0042876/', ' /title/tt0055630/', ' /title/tt0372784/', ' /title/tt0105695/', ' /title/tt0363163/', ' /title/tt6966692/', ' /title/tt0118849/', ' /title/tt0053291/', ' /title/tt0347149/', ' /title/tt0091251/', ' /title/tt0089881/', ' /title/tt0057115/', ' /title/tt0042192/', ' /title/tt0268978/', ' /title/tt0096283/', ' /title/tt0112641/', ' /title/tt0457430/', ' /title/tt0081398/', ' /title/tt1305806/', ' /title/tt0120735/', ' /title/tt0993846/', ' /title/tt0040897/', ' /title/tt0055031/', ' /title/tt5027774/', ' /title/tt0469494/', ' /title/tt0071315/', ' /title/tt0476735/', ' /title/tt0015864/', ' /title/tt0046912/', ' /title/tt0434409/', ' /title/tt2096673/', ' /title/tt0050976/', ' /title/tt1291584/', ' /title/tt0477348/', ' /title/tt0117951/', ' /title/tt0080678/', ' /title/tt1130884/', ' /title/tt0167404/', ' /title/tt3170832/', ' /title/tt0084787/', ' /title/tt0031381/', ' /title/tt0083658/', ' /title/tt0107290/', ' /title/tt0050212/', ' /title/tt0041959/', ' /title/tt0047296/', ' /title/tt0266543/', ' /title/tt0050986/', ' /title/tt0116282/', ' /title/tt0266697/', ' /title/tt1205489/', ' /title/tt0077416/', ' /title/tt0079944/', ' /title/tt0120382/', ' /title/tt0046438/', ' /title/tt1950186/', ' /title/tt0116231/', ' /title/tt3011894/', ' /title/tt8108198/', ' /title/tt0353969/', ' /title/tt0118715/', ' /title/tt0978762/', ' /title/tt0107207/', ' /title/tt2267998/', ' /title/tt4729430/', ' /title/tt2119532/', ' /title/tt2278388/', ' /title/tt0031679/', ' /title/tt0892769/', ' /title/tt0017925/', ' /title/tt0015324/', ' /title/tt0112471/', ' /title/tt0060827/', ' /title/tt0264464/', ' /title/tt1392214/', ' /title/tt2024544/', ' /title/tt0061512/', ' /title/tt0758758/', ' /title/tt0046268/', ' /title/tt0074958/', ' /title/tt0092005/', ' /title/tt1392190/', ' /title/tt0079470/', ' /title/tt0072684/', ' /title/tt0091763/', ' /title/tt0405159/', ' /title/tt1028532/', ' /title/tt0019254/', ' /title/tt1979320/', ' /title/tt3315342/', ' /title/tt0052618/', ' /title/tt0405508/', ' /title/tt0087544/', ' /title/tt0060107/', ' /title/tt1201607/', ' /title/tt0097165/', ' /title/tt0053198/', ' /title/tt0245712/', ' /title/tt0395169/', ' /title/tt1895587/', ' /title/tt0032976/', ' /title/tt4016934/', ' /title/tt0075148/', ' /title/tt0198781/', ' /title/tt0113247/', ' /title/tt0025316/', ' /title/tt0043338/', ' /title/tt0040725/', ' /title/tt0093779/', ' /title/tt0042041/', ' /title/tt0381681/', ' /title/tt0118694/', ' /title/tt0169102/', ' /title/tt1954470/', ' /title/tt1454029/', ' /title/tt0088247/', ' /title/tt0064115/', ' /title/tt0087884/', ' /title/tt0094625/', ' /title/tt2338151/', ' /title/tt0103639/', ' /title/tt4857264/', ' /title/tt2015381/', ' /title/tt0107048/', ' /title/tt0010323/', ' /title/tt0169858/'], 'movie_rate': [' 9.222352298932535}', ' 9.149459830968244}', ' 8.982003465767905}', ' 8.969818219072174}', ' 8.92419107925773}', ' 8.903202955255063}', ' 8.880342018614696}', ' 8.85114611887013}', ' 8.801450831661567}', ' 8.773994350132563}', ' 8.771820572940344}', ' 8.751824454942327}', ' 8.717583570210909}', ' 8.705530920572766}', ' 8.696701688543236}', ' 8.649002400729728}', ' 8.644834945072988}', ' 8.642239117447819}', ' 8.603732273483935}', ' 8.593421988927883}', ' 8.583952018799021}', ' 8.580559966090231}', ' 8.580248081360775}', ' 8.575047228660262}', ' 8.571537005577918}', ' 8.565186682023928}', ' 8.547197496163138}', ' 8.545182807093765}', ' 8.532952888522416}', ' 8.523324771275098}', ' 8.518086887003532}', ' 8.510681864027575}', ' 8.507312192058194}', ' 8.492267076576706}', ' 8.487603465610492}', ' 8.484573890876373}', ' 8.484258340549442}', ' 8.484238854443788}', ' 8.482344552825188}', ' 8.475945813284133}', ' 8.47545917793924}', ' 8.473694620182311}', ' 8.473211185624749}', ' 8.471352163848328}', ' 8.469157536780605}', ' 8.468274573962582}', ' 8.457930016779079}', ' 8.457775094583377}', ' 8.440103028158777}', ' 8.439990155983857}', ' 8.429259890314231}', ' 8.426925707965578}', ' 8.424873578197731}', ' 8.420669106349811}', ' 8.42062916776038}', ' 8.418894050529104}', ' 8.412556760163707}', ' 8.397266426514188}', ' 8.393996742231687}', ' 8.38860352775923}', ' 8.374859539226753}', ' 8.373005822156042}', ' 8.372793871947085}', ' 8.365885387643086}', ' 8.362325877155284}', ' 8.361602203066177}', ' 8.360684321451174}', ' 8.360023405757156}', ' 8.35807211696367}', ' 8.352398198554091}', ' 8.350563804056863}', ' 8.336874321273768}', ' 8.33233577574728}', ' 8.327974745807424}', ' 8.317549677886216}', ' 8.314508040166293}', ' 8.31209195684466}', ' 8.306021782204974}', ' 8.30204661942215}', ' 8.298097818418118}', ' 8.294050524171046}', ' 8.293915623585592}', ' 8.292961378277873}', ' 8.292303602973087}', ' 8.289136196601637}', ' 8.287446456785732}', ' 8.279572118337711}', ' 8.277298590469183}', ' 8.273213081269077}', ' 8.272039424743507}', ' 8.271208615106113}', ' 8.270436484442808}', ' 8.266667533418936}', ' 8.261924089097088}', ' 8.259022391765107}', ' 8.258749518749948}', ' 8.254938282319957}', ' 8.252771397560172}', ' 8.251378936860455}', ' 8.249415427275839}', ' 8.248016299004668}', ' 8.245253264758654}', ' 8.244768516353092}', ' 8.244469645210227}', ' 8.24369296490588}', ' 8.24310685993402}', ' 8.241642660279009}', ' 8.238314642951057}', ' 8.237862725859006}', ' 8.235616349592986}', ' 8.232188039591877}', ' 8.230626939799343}', ' 8.228520738116265}', ' 8.227362263809846}', ' 8.225360177739818}', ' 8.221257518042504}', ' 8.219272562539174}', ' 8.215712177048287}', ' 8.21300676991253}', ' 8.2108955301191}', ' 8.208141147819079}', ' 8.205751883545336}', ' 8.202727297454533}', ' 8.196845228641074}', ' 8.196816395238109}', ' 8.195075750182248}', ' 8.193393076412333}', ' 8.187790825195119}', ' 8.186867583112424}', ' 8.186667927316032}', ' 8.18333171162207}', ' 8.176451671477938}', ' 8.173682457685725}', ' 8.17099572570362}', ' 8.166763907000826}', ' 8.166373533174342}', ' 8.162884669971906}', ' 8.162419192879842}', ' 8.161111893511805}', ' 8.160168245780955}', ' 8.157820351154525}', ' 8.154048408129306}', ' 8.153422389389286}', ' 8.152003064187554}', ' 8.14879373451584}', ' 8.144323519313929}', ' 8.141329881847163}', ' 8.13736330310941}', ' 8.1347010466234}', ' 8.134277673245556}', ' 8.133560281152372}', ' 8.13331273730279}', ' 8.13166907726159}', ' 8.12700814733653}', ' 8.125899833466097}', ' 8.125393021285708}', ' 8.123518390228794}', ' 8.123236432516514}', ' 8.118792165317249}', ' 8.117544455990272}', ' 8.116651747845271}', ' 8.114402699741552}', ' 8.114320442954542}', ' 8.112894064506353}', ' 8.108056953604205}', ' 8.107299549112515}', ' 8.107044605839805}', ' 8.106997536346931}', ' 8.10499231545904}', ' 8.103690853843275}', ' 8.103579195726933}', ' 8.103009519166188}', ' 8.100738367608892}', ' 8.09879780761159}', ' 8.098495025368358}', ' 8.095551641891458}', ' 8.095408570495753}', ' 8.095147271531333}', ' 8.095043159586512}', ' 8.092505724296613}', ' 8.092340918172813}', ' 8.0914433991544}', ' 8.090921143230528}', ' 8.090127060414464}', ' 8.08970504340887}', ' 8.086792321866717}', ' 8.083398113889816}', ' 8.082610849338245}', ' 8.082289161043407}', ' 8.080966926081546}', ' 8.077463026861034}', ' 8.076935773025957}', ' 8.075844199629227}', ' 8.075717555958324}', ' 8.073634641868583}', ' 8.072466605319658}', ' 8.071494094730495}', ' 8.071300134747174}', ' 8.065046635996536}', ' 8.062667002396456}', ' 8.061862232643234}', ' 8.059803745069264}', ' 8.05971291123296}', ' 8.059435498431395}', ' 8.059353760685624}', ' 8.05913224151712}', ' 8.057795681618822}', ' 8.056485198608925}', ' 8.054982650851738}', ' 8.054275914407816}', ' 8.052617153323936}', ' 8.052064237231457}', ' 8.051957541453968}', ' 8.051121644858588}', ' 8.05105436562728}', ' 8.048451488426874}', ' 8.048021515523958}', ' 8.0477275762411}', ' 8.04684100660077}', ' 8.046313868178602}', ' 8.045914847793023}', ' 8.04478186763059}', ' 8.044198735114511}', ' 8.043422022907182}', ' 8.036562566879883}', ' 8.035459809371353}', ' 8.034850750290046}', ' 8.029826851106586}', ' 8.027499284824824}', ' 8.026351298691987}', ' 8.024735314250977}', ' 8.023522991306418}', ' 8.015790446096965}', ' 8.014405910993576}', ' 8.01391264832029}', ' 8.01353799524636}', ' 8.012869632325865}', ' 8.012122894814157}', ' 8.012030394844226}', ' 8.010215639178949}', ' 8.009323715560509}', ' 8.008178032956994}', ' 8.007851612232365}', ' 8.006699156643473}', ' 8.005867141214114}', ' 8.005779165241664}', ' 8.003387878438396}', ' 8.003107155133545}', ' 8.002999235033062}', ' 8.002818394284503}]']}
#把键读出来成为一个列表
columns_name=list(dict_data.keys())
columns_name
#建立一个DataFrame,列名即键名
df = pd.DataFrame(dict_data,columns=columns_name)
df.head()
movie_idmovie_nameyearmovie_linkmovie_rate
0111161The Shawshank Redemption1994/title/tt0111161/9.222352298932535}
168646The Godfather1972/title/tt0068646/9.149459830968244}
271562The Godfather1974/title/tt0071562/8.982003465767905}
3468569The Dark Knight2008/title/tt0468569/8.969818219072174}
45008312 Angry Men1957/title/tt0050083/8.92419107925773}
#上述的读取键名列表——建立DataFrame两个步骤,其实可以用一行代码解决:
df1 = pd.DataFrame.from_dict(dict_data)
df1.head()
movie_idmovie_nameyearmovie_linkmovie_rate
0111161The Shawshank Redemption1994/title/tt0111161/9.222352298932535}
168646The Godfather1972/title/tt0068646/9.149459830968244}
271562The Godfather1974/title/tt0071562/8.982003465767905}
3468569The Dark Knight2008/title/tt0468569/8.969818219072174}
45008312 Angry Men1957/title/tt0050083/8.92419107925773}
#保存为`csv`文件,不要行名字,保留列名字
df.to_csv('movie13out.csv',index=False,header=True)#保存为`txt`文件,不要行名字,不要列名字(表头)
df.to_csv('movie13out.txt',index=False,header=False)

代码汇总

#!/usr/bin/python 
# -*- coding: utf-8 -*-import pandas as pddict_data={}
with open('movie13.txt','r')as df:for line in df:if line.count('\n') == len(line):continuefor kv in [line.strip().split(':')]:dict_data.setdefault(kv[0],[]).append(kv[1])
#print(dict_data)
df = pd.DataFrame.from_dict(dict_data)
df.to_csv('movie13out.csv',index=False,header=True)
df.head()
movie_idmovie_nameyearmovie_linkmovie_rate
0111161The Shawshank Redemption1994/title/tt0111161/9.222352298932535}
168646The Godfather1972/title/tt0068646/9.149459830968244}
271562The Godfather1974/title/tt0071562/8.982003465767905}
3468569The Dark Knight2008/title/tt0468569/8.969818219072174}
45008312 Angry Men1957/title/tt0050083/8.92419107925773}

这样调整后,这份数据就可以用pandas等进行统计分析和可视化了!

参考:=a21bo.50862.201879


本文标签: pandas 读取包含多个字典的txt数据文件