Delimited list, data is messy, pandas and pyparsing can't parse


Edit 3: I opened the file in notepad (plain text format) it showed that the data lines were enclosed with double quotes, causing Pandas to interpret it as a single string. What&#39;s interesting is that that didn&#39;t show in Excel and the double quotes didn&#39;t port over when I copied and pasted the data from Excel into this post. Thanks for all of your thoughts and sorry for the waste of time. 

Edit 2: Here is a link to the actual [test file.][1]

Edit 1: I am using Python version 3.11.4 and Pandas version 2.0.2. I should also that the file is saved as a Microsoft Excel Comma Separated Values File.

Disclaimer: I am not a programmer and this is my first time posting. 

I am attempting to use pandas to read a .csv to a dataframe for analysis. The .read_csv attribute in pandas can parse the header but cannot seem to parse the data. Instead it gives me the appropriate number of columns with the appropriate headers but concatenates all the data into one long str in the first column and assigns NaN to all other indexes. I tried using pyparsing to parse the string in the first row. But it returned an error that makes me believe the problem lies with the hyphens in the first column of data although there are later rows where that field is blank, so I can&#39;t be certain. It&#39;s worth noting that the csv module does seem to be able to read the file but I am unclear on how to convert the _csv.reader object to a workable dataframe for analysis. I&#39;ve included the code and outputs discussed here below. It&#39;s all pretty basic, but I&#39;ve poured over the documentation and can&#39;t seem to find the right solution. 

I&#39;ve been working with a test file that is the first 5 lines of data but the actual dataset is &gt;15 million lines long, so any solution will need to be &quot;large dataset friendly&quot;


STST-SM-12964;GF2191430006CD;1682503200000;"2023-04-26 10:00:00";2;Operative;;
STST-SM-13783;GF2193050000CM;1682199000000;"2023-04-22 21:30:00";1;Operative;;
STST-SM-2978;GF2200050000W5;1681243200000;"2023-04-11 20:00:00";2;Operative;;
STST-SM-3227;GF2200190001EC;1680750900000;"2023-04-06 03:15:00";1;Operative;;
STST-SM-3184;GF22002500014D;1682155800000;"2023-04-22 09:30:00";0;Operative;;


**Attempt to Read CSV using Pandas**

import pandas as pd
import pyparsing as pp
import csv

Test = pd.read_csv("C:/.../Test.csv", delimiter=';', header=0)

**Output in terminal**

gateway_din ... alert_codes_cabinet
0 STST-SM-12964;GF2191430006CD;1682503200000;"20... ... NaN
1 STST-SM-13783;GF2193050000CM;1682199000000;"20... ... NaN
2 STST-SM-2978;GF2200050000W5;1681243200000;"202... ... NaN
3 STST-SM-3227;GF2200190001EC;1680750900000;"202... ... NaN
4 STST-SM-3184;GF22002500014D;1682155800000;"202... ... NaN

[5 rows x 8 columns]

'STST-SM-12964;GF2191430006CD;1682503200000;"2023-04-26 10:00:00";2;Operative;;'


**Attempt to parse string in Column 1 using PyParse in the terminal**

ParseTest = pp.DelimitedList(';').parse_string(Test.iat[0,0])
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
ParseTest = pp.DelimitedList(';').parse_string(Test.iat[0,0])
File "C:\Users\dschell\AppData\Roaming\Python\Python311\site-packages\pyparsing\core.py", line 1190, in parse_string
raise exc.with_traceback(None)
pyparsing.exceptions.ParseException: , found 'STST' (at char 0), (line:1, col:1)

**Attempt to read .CSV using CSV module**

with open('C:/Users/dschell/Desktop/Reliability and Usability/Survey Data/Test.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=';', quotechar='"')
for row in spamreader:
print(', '.join(row))

**Output in terminal**

gateway_din, charger_vin, bucket, bucket_dt, post, operative_status, alert_codes, alert_codes_cabinet
STST-SM-12964;GF2191430006CD;1682503200000;"2023-04-26 10:00:00";2;Operative;;
STST-SM-13783;GF2193050000CM;1682199000000;"2023-04-22 21:30:00";1;Operative;;
STST-SM-2978;GF2200050000W5;1681243200000;"2023-04-11 20:00:00";2;Operative;;
STST-SM-3227;GF2200190001EC;1680750900000;"2023-04-06 03:15:00";1;Operative;;
STST-SM-3184;GF22002500014D;1682155800000;"2023-04-22 09:30:00";0;Operative;;

  [1]: https://www.dropbox.com/s/yoh3v5f6044ztl1/Test.csv?dl=0


It seems like your data has some extra hidden quotes. Opening the file you sent in notepad, line 2 has this

    &quot;STST-SM-12964;GF2191430006CD;1682503200000;&quot;&quot;2023-04-26 10:00:00&quot;&quot;;2;Operative;;&quot;

As you can see there&#39;s extra quotes before and after the date-time section. With the extra quotes removed like this

    STST-SM-12964;GF2191430006CD;1682503200000;&quot;2023-04-26 10:00:00&quot;;2;Operative;;

 it seems to work correctly.


